WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 

C12N 15/31, C07K 14/315, 16/12, C12Q 
1/68 



A2 



(11) International Publication Number: 
(43) International Publication Date: 



WO 98/18931 

7 May 1998 (07.05.98) 



(21) International Application Number: PCT/US97/ 19588 

(22) International Filing Date: 30 October 1997 (30.10.97) 



(30) Priority Data: 

60/029,960 



31 October 1996 (31.10.96) 



US 



(71) Applicant (for all designated States except US): HUMAN 

GENOME SCIENCES, INC. [US/US]; 9410 Key West 
Avenue, Rockville, MD 20850 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): KUNSCH, Charles, A. 
[US/US]; 2398B Dunwoody Crossing, Atlanta, GA 30338 
(US). CHOI, Gil, H. [KR/US]; 11429 Potomac Oaks Drive, 
Rockville, MD 20850 (US). DILLON, Patrick, J. [US/US]; 
1055 Snipe Court, Carlsbad, CA 92009 (US). ROSEN, 
Craig, A. [US/US]; 22400 Rolling Hill Road, Laytonsville, 
MD 20882 (US). BARASH, Steven, C. [US/US]; 582 Col- 
lege Parkway #303, Rockville, MD 20850 (US). FAN- 
NON, Michael [US/US]; 13501 Rippling Brook Drive, Sil- 
ver Spring, MD 20850 (US). DOUGHERTY, Brian, A. 
[US/US]; 708 Meadow Field Court, Mount Airy, MD 21771 
(US). 



(74) Agents: BROOKES, A., Anders et al.; Human Genome 
Sciences, Inc., 9410 Key West Avenue, Rockville, MD 
20850 (US). 



(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR, 
BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE, 
GH, HU, ID, IL, IS, JP, KE, KG, KP, KR, KZ, LC, LK, 
LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, MX, NO, 
NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, 
TT, UA, UG, US, UZ, VN, YU, ZW, ARIPO patent (GH, 
KE, LS, MW, SD, SZ, UG, ZW), Eurasian patent (AM, AZ, 
BY, KG, KZ, MD, RU, TJ, TM), European patent (AT, BE, 
CH, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, 
PT, SE), OAPI patent (BF, BJ, CF, CG, CI, CM, GA, GN, 
ML, MR, NE, SN, TD, TG). 



Published 

Without international search report and to be republished 
upon receipt of that report. 



(54) Title: STREPTOCOCCUS PNEUMONIAE POLYNUCLEOTIDES AND SEQUENCES 



Computer System 102 
\ 



BUS 



104 



Processor 



106 



Majn Memory 



108 



Secondary Storage 



/S Devices 110 



Hard Drive 



Y 



1 14 



Removable Medium 
Storage Device 



116 



Removable Storage 
Medium 



(57) Abstract 

The present invention provides polynucleotide sequences of the genome of Streptococcus pneumoniae, polypeptide sequences encoded 
by the polynucleotide sequences, corresponding polynucleotides and polypeptides, vectors and hosts comprising the polynucleotides, and 
assays and other uses thereof. The present invention further provides polynucleotide and polypeptide sequence information stored on 
computer readable media, and computer-based systems and methods which facilitate its use. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


SZ 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


C6te d'lvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






CZ 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







WO 98/18931 



1 



PCT/US97/19588 



Streptococcus pneumoniae Polynucleotides and Sequences 

FIELD OF THE INVENTION 

5 The present invention relates to the field of molecular biology. In 

particular, it relates to, among other things, nucleotide sequences of Streptococcus 
pneumoniae, contigs, ORFs, fragments, probes, primers and related 
polynucleotides thereof, peptides and polypeptides encoded by the sequences, and 
uses of the polynucleotides and sequences thereof, such as in fermentation, 
10 polypeptide production, assays and pharmaceutical development, among others. 

BACKGROUND OF THE INVENTION 

Streptococcus pneumoniae has been one of the most extensively studied 

15 microorganisms since its first isolation in 1881. It was the object of many 
investigations that led to important scientific discoveries. In 1928, Griffith 
observed that when heat-killed encapsulated pneumococci and live strains 
constitutively lacking any capsule were concomitantly injected into mice, the 
nonencapsulated could be converted into encapsulated pneumococci with the same 

20 capsular type as the heat-killed strain. Years later, the nature of this "transforming 
principle," or carrier of genetic information, was shown to be DNA. (Avery, O/T., 
etal.,J. Exp. Med., 79:137-157 (1944)). 

In spite of the vast number of publications on S. pneumoniae many 
questions about its virulence are still unanswered, and this pathogen remains a 

25 major causative agent of serious human disease, especially community-acquired 
pneumonia. (Johnston, R.B., et aL, Rev. Infect. Dis. 73(Suppl. 6):S509-517 
(1991)). In addition, in developing countries, the pneumococcus is responsible for 
the death of a large number of children under the age of 5 years from pneumococcal 
pneumonia. The incidence of pneumococcal disease is highest in infants under 2 

30 years of age and in people over 60 years of age. Pneumococci are the second most 
frequent cause (after Haemophilus influenzae type b) of bacterial meningitis and 
otitis media in children. With the recent introduction of conjugate vaccines for H. 
influenzae type b, pneumococcal meningitis is likely to become increasingly 
prominent. 5. pneumoniae is the most important etiologic agent of community- 
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acquired pneumonia in adults and is the second most common cause of bacterial 
meningitis behind Neisseria meningitidis. 

The antibiotic generally prescribed to treat S. pneumoniae is 
benzylpenicillin, although resistance to this and to other antibiotics is found 
5 occasionally. Pneumococcal resistance to penicillin results from mutations in its 
penicillin-binding proteins. In uncomplicated pneumococcal pneumonia caused by 
a sensitive strain, treatment with penicillin is usually successful unless started too 
late. Erythromycin or clindamycin can be used to treat pneumonia in patients 
hypersensitive to penicillin, but resistant strains to these drugs exist. Broad 
10 spectrum antibiotics (e.g., the tetracyclines) may also be effective, although 
tetracycline-resistant strains are not rare. In spite of the availability of antibiotics, 
the mortality of pneumococcal bacteremia in the last four decades has remained 
stable between 25 and 29%. (Gillespie, S.H., et aL, J. Med. Microbiol. 28:237- 
248 (1989). 

15 S. pneumoniae is carried in the upper respiratory tract by many healthy 

individuals. It has been suggested that attachment of pneumococci is mediated by a 
disaccharide receptor on fibronectin, present on human pharyngeal epithelial cells. 
(Anderson, B.J., et al, J. Immunol. 742:2464-2468 (1989). The mechanisms by 
which pneumococci translocate from the nasopharynx to the lung, thereby causing 

20 pneumonia, or migrate to the blood, giving rise to bacteremia or septicemia, are 
poorly understood. (Johnston, R.B., et al, Rev. Infect. Dis. 7J(Suppl. 6):S509- 
517(1991). 

Various proteins have been suggested to be involved in the pathogenicity of 
S. pneumoniae, however, only a few of them have actually been confirmed as 

25 virulence factors. Pneumococci produce an IgAl protease that might interfere with 
host defense at mucosal surfaces. (Kornfield, SJ., et al, Rev. Inf. Dis. 5:521- 
534 (1981). S. pneumoniae also produces neuraminidase, an enzyme that may 
facilitate attachment to epithelial cells by cleaving sialic acid from the host 
glycolipids and gangliosides. Partially purified neuraminidase was observed to 

30 induce meningitis-like symptoms in mice; however, the reliability of this finding 
has been questioned because the neuraminidase preparations used were probably 
contaminated with cell wall products. Other pneumococcal proteins besides 
neuraminidase are involved in the adhesion of pneumococci to epithelial and 
endothelial cells. These pneumococcal proteins have as yet not been identified. 

35 Recently, Cundell et- al., reported that peptide permeases can modulate 



WO 98/18931 



3 



PCT/US97/19588 



pneumococcal adherence to epithelial and endothelial cells. It was, however, 
unclear whether these permeases function directly as adhesions or whether they 
enhance adherence by modulating the expression of pneumococcal adhesions. 
(DeVelasco, E.A., et al, Micro. Rev. 59:591-603 (1995). A better understanding 
5 of the virulence factors determining its pathogenicity will need to be developed to 
cope with the devastating effects of pneumococcal disease in humans. 

Ironically, despite the prominent role of S. pneumoniae in the discovery of 
DNA, little is known about the molecular genetics of the organism. The S. 
pneumoniae genome consists of one circular, covalently closed, double-stranded 

10 DNA and a collection of so-called variable accessory elements, such as prophages, 
plasmids, transposons and the like. Most physical characteristics and almost all of 
the genes of S. pneumoniae are unknown. Among the few that have been 
identified, most have not been physically mapped or characterized in detail. Only a 
few genes of this organism have been sequenced. (See, for instance current 

15 versions of GENBANK and other nucleic acid databases, and references that relate 
to the genome of S. pneumoniae such as those set out elsewhere herein.) 

It is clear that the etiology of diseases mediated or exacerbated by S. 
pneumoniae, infection involves the programmed expression of S. pneumoniae 
genes, and that characterizing the genes and their patterns of expression would add 

20 dramatically to our understanding of the organism and its host interactions. 
Knowledge of S. pneumoniae genes and genomic organization would improve our 
understanding of disease etiology and lead to improved and new ways of 
preventing, ameliorating, arresting and reversing diseases. Moreover, 
characterized genes and genomic fragments of S. pneumoniae would provide 

25 reagents for, among other things, detecting, characterizing and controlling S. 
pneumoniae infections. There is a need to characterize the genome of S. 
pneumoniae and for polynucleotides of this organism. 
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SUMMARY OF THE INVENTION 

The present invention is based on the sequencing of fragments of the 
5 Streptococcus pneumoniae genome. The primary nucleotide sequences which were 
generated are provided in SEQ ID NOS: 1-391. 

The present invention provides the nucleotide sequence of several hundred 
contigs of the Streptococcus pneumoniae genome, which are listed in tables below 
and set out in the Sequence Listing submitted herewith, and representative 

10 fragments thereof, in a form which can be readily used, analyzed, and interpreted 
by a skilled artisan. In one embodiment, the present invention is provided as 
contiguous strings of primary sequence information corresponding to the 
nucleotide sequences depicted in SEQ ID NOS: 1-391. 

The present invention further provides nucleotide sequences which are at 

15 least 95% identical to the nucleotide sequences of SEQ ID NOS: 1-391. 

The nucleotide sequence of SEQ ID NOS: 1-391, a representative fragment 
thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide 
sequence of SEQ ID NOS: 1-391 may be provided in a variety of mediums to 
facilitate its use. In one application of this embodiment, the sequences of the 

20 present invention are recorded on computer readable media. Such media includes, 
but is not limited to: magnetic storage media, such as floppy discs, hard disc 
storage medium, and magnetic tape; optical storage media such as CD-ROM; 
electrical storage media such as RAM and ROM; and hybrids of these categories 
such as magnetic/optical storage media. 

25 The present invention further provides systems, particularly computer- 

based systems which contain the sequence information herein described stored in a 
data storage means. Such systems are designed to identify commercially important 
fragments of the Streptococcus pneumoniae genome. 

Another embodiment of the present invention is directed to fragments of the 

30 Streptococcus pneumoniae genome having particular structural or functional 
attributes. Such fragments of the Streptococcus pneumoniae genome of the present 
invention include, but are not limited to, fragments which encode peptides, 
hereinafter referred to as open reading frames or ORFs, fragments which modulate 
the expression of an operably linked ORF, hereinafter referred to as expression 

35 modulating fragments or EMFs, and fragments which can be used to diagnose the 
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presence of Streptococcus pneumoniae in a sample, hereinafter referred to as 
diagnostic fragments or DFs. 

Each of the ORFs in fragments of the Streptococcus pneumoniae genome 
disclosed in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in 
5 numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining the 
presence of a specific microbe in a sample, to selectively control gene expression in 
a host and in the production of polypeptides, such as polypeptides encoded by 
ORFs of the present invention, particular those polypeptides that have a 
10 pharmacological activity. 

The present invention further includes recombinant constructs comprising 
one or more fragments of the Streptococcus pneumoniae genome of the present 
invention. The recombinant constructs of the present invention comprise vectors, 
such as a plasmid or viral vector, into which a fragment of the Streptococcus 
15 pneumoniae has been inserted. 

The present invention further provides host cells containing any of the 
isolated fragments of the Streptococcus pneumoniae genome of the present 
invention. The host cells can be a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a 
20 bacterial cell. 

The present invention is further directed to isolated polypeptides and 
proteins encoded by ORFs of the present invention. A variety of methods, well 
known to those of skill in the art, routinely may be utilized to obtain any of the 
polypeptides and proteins of the present invention. For instance, polypeptides and 

25 proteins of the present invention having relatively short, simple amino acid 
sequences readily can be synthesized using commercially available automated 
peptide synthesizers. Polypeptides and proteins of the present invention also may 
be purified from bacterial cells which naturally produce the protein. Yet another 
alternative is to purify polypeptide and proteins of the present invention from cells 

30 which have been altered to express them. 

The invention further provides methods of obtaining homologs of the 
fragments of the Streptococcus pneumoniae genome of the present invention and 
homologs of the proteins encoded by the ORFs of the present invention. 
Specifically, by using the nucleotide and amino acid sequences disclosed herein as 
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a probe or as primers, and techniques such as PCR cloning and colony/plaque 
hybridization, one skilled in the art can obtain homologs. 

The invention further provides antibodies which selectively bind 
polypeptides and proteins of the present invention. Such antibodies include both 
5 monoclonal and polyclonal antibodies. 

The invention further provides hybridomas which produce the above- 
described antibodies. A hybridoma is an immortalized cell line which is capable of 
secreting a specific monoclonal antibody. 

The present invention further provides methods of identifying test samples 
10 derived from cells which express one of the ORFs of the present invention, or a 
homolog thereof. Such methods comprise incubating a test sample with one or 
more of the antibodies of the present invention, or one or more of the DFs of the 
present invention, under conditions which allow a skilled artisan to determine if the 
sample contains the ORF or product produced therefrom. 
15 In another embodiment of the present invention, kits are provided which 

contain the necessary reagents to carry out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the antibodies, or one of the DFs of the present invention; and 
20 (b) one or more other containers comprising one or more of the following: wash 
reagents, reagents capable of detecting presence of bound antibodies or hybridized 
DFs. 

Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents capable of binding to 

25 a polypeptide or protein encoded by one of the ORFs of the present invention. 
Specifically, such agents include, as further described below, antibodies, peptides, 
carbohydrates, pharmaceutical agents and the like. Such methods comprise steps 
of: (a) contacting an agent with an isolated protein encoded by one of the ORFs of 
the present invention; and (b) determining whether the agent binds to said protein. 

30 The present genomic sequences of Streptococcus pneumoniae will be of 

great value to all laboratories working with this organism and for a variety of 
commercial purposes. Many fragments of the Streptococcus pneumoniae genome 
will be immediately identified by similarity searches against GenBank or protein 
databases and will be of immediate value to Streptococcus pneumoniae researchers 
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and for immediate commercial value for the production of proteins or to control 
gene expression. 

The methodology and technology for elucidating extensive genomic 
sequences of bacterial and other genomes has and will greatly enhance the ability to 
5 analyze and understand chromosomal organization. In particular, sequenced 
contigs and genomes will provide the models for developing tools for the analysis 
of chromosome structure and function, including the ability to identify genes within 
large segments of genomic DNA, the structure, position, and spacing of regulatory 
elements, the identification of genes with potential industrial applications, and the 
10 ability to do comparative genomic and molecular phylogeny. 

DESCRIPTION OF THE FIGURES 

FIGURE 1 is a block diagram of a computer system (102) that can be 
15 used to implement computer-based systems of present invention. 

FIGURE 2 is a schematic diagram depicting the data flow and computer 
programs used to collect, assemble, edit and annotate the contigs of the 
Streptococcus pneumoniae genome of the present invention. Both Macintosh and 

20 Unix platforms are used to handle the AB 373 and 377 sequence data files, largely 
as described in Kerlavage et al, Proceedings of the Twenty-Sixth Annual Hawaii 
International Conference on System Sciences, 585, IEEE Computer Society Press, 
Washington D.C. (1993). Factura (AB) is a Macintosh program designed for 
automatic vector sequence removal and end-trimming of sequence files. The 

25 program Loadis runs on a Macintosh platform and parses the feature data extracted 
from the sequence files by Factura to the Unix based Streptococcus pneumoniae 
relational database. Assembly of contigs (and whole genome sequences) is 
accomplished by retrieving a specific set of sequence files and their associated 
features using Extrseq, a Unix utility for retrieving sequences from an SQL 

30 database. The resulting sequence file is processed by seqjfilter to trim portions of 
the sequences with more than 2% ambiguous nucleotides. The sequence files were 
assembled using TIGR Assembler, an assembly engine designed at The Institute 
for Genomic Research ( TIGR ) for rapid and accurate assembly of thousands of 
sequence fragments. The collection of contigs generated by the assembly step is 

35 loaded into the database with the lassie program. Identification of open reading 



WO 98/18931 



8 



PCT7US97/19588 



frames (ORFs) is accomplished by processing contigs with zorf or GenMark. The 
ORFs are searched against S. pneumoniae sequences from GenBank and against all 
protein sequences using the BLASTN and BLASTP programs, described in 
Altschul et aL, J. Mol Biol. 215: 403-410 (1990)). Results of the ORF 
5 determination and similarity searching steps were loaded into the database. As 
described below, some results of the determination and the searches are set out in 
Tables 1-3. 

DETAILED DE SCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

10 

The present invention is based on the sequencing of fragments of the 
Streptococcus pneumoniae genome and analysis of the sequences. The primary 
nucleotide sequences generated by sequencing the fragments are provided in SEQ 
ED NOS: 1-391. (As used herein, the "primary sequence" refers to the nucleotide 

15 sequence represented by the IUPAC nomenclature system.) 

In addition to the aforementioned Streptococcus pneumoniae polynucleotide 
and polynucleotide sequences, the present invention provides the nucleotide 
sequences of SEQ ID NOS: 1-391, or representative fragments thereof, in a form 
which can be readily used, analyzed, and interpreted by a skilled artisan. 

20 As used herein, a "representative fragment of the nucleotide sequence 

depicted in SEQ ID NOS: 1-391" refers to any portion of the SEQ ID NOS: 1-391 
which is not presently represented within a publicly available database. Preferred 
representative fragments of the present invention are Streptococcus pneumoniae 
open reading frames ( ORFs ), expression modulating fragment ( EMFs ) and 

25 fragments which can be used to diagnose the presence of Streptococcus 
pneumoniae in sample ( DFs ). A non-limiting identification of preferred 
representative fragments is provided in Tables 1-3. As discussed in detail below, 
the information provided in SEQ ID NOS: 1-391 and in Tables 1-3 together with 
routine cloning, synthesis, sequencing and assay methods will enable those skilled 

30 in the art to clone and sequence all "representative fragments" of interest, including 
open reading frames encoding a large variety of Streptococcus pneumoniae 
proteins. 

While the presently disclosed sequences of SEQ ID NOS: 1-391 are highly 
accurate, sequencing techniques are not perfect and, in relatively rare instances, 
35 further investigation of a fragment or sequence of the invention may reveal a 
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nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID 
NOS:l-391. However, once the present invention is made available {i.e., once the 
information in SEQ ID NOS: 1-391 and Tables 1-3 has been made available), 
resolving a rare sequencing error in SEQ ID NOS: 1-391 will be well within the 
5 skill of the art. The present disclosure makes available sufficient sequence 
information to allow any of the described contigs or portions thereof to be obtained 
readily by straightforward application of routine techniques. Further sequencing of 
such polynucleotide may proceed in like manner using manual and automated 
sequencing methods which are employed ubiquitous in the art. Nucleotide 

10 sequence editing software is publicly available. For example, Applied Biosystem's 
(AB) Auto Assembler can be used as an aid during visual inspection of nucleotide 
sequences. By employing such routine techniques potential errors readily may be 
identified and the correct sequence then may be ascertained by targeting further 
sequencing effort, also of a routine nature, to the region containing the potential 

15 error. 

Even if all of the very rare sequencing errors in SEQ ID NOS: 1-391 were 
corrected, the resulting nucleotide sequences would still be at least 95% identical, 
nearly all would be at least 99% identical, and the great majority would be at least 
99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391. 

20 As discussed elsewhere herein, polynucleotides of the present invention 

readily may be obtained by routine application of well known and standard 
procedures for cloning and sequencing DNA. Detailed methods for obtaining 
libraries and for sequencing are provided below, for instance. A wide variety of 
Streptococcus pneumoniae strains that can be used to prepare S. pneumoniae 

25 genomic DNA for cloning and for obtaining polynucleotides of the present 
invention are available to the public from recognized depository institutions, such 
as the American Type Culture Collection ( ATCC ). While the present invention is 
enabled by the sequences and other information herein disclosed, the S. 
pneumoniae strain that provided the DNA of the present Sequence Listing, Strain 

30 7/87 14.8.91, has been deposited in the ATCC, as a convenience to those of skill 
in the art. As a further convenience, a library of S. pneumoniae genomic DNA, 
derived from the same strain, also has been deposited in the ATCC. The S. 
pneumoniae strain was deposited on October 10, 1996, and was given Deposit No. 
55840, and the cDNA library was deposited on October 11, 1996 and was given 

35 Deposit No. 97755. The genomic fragments in the library are 15 to 20 kb 
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fragments generated by partial Sau3Al digestion and they are inserted into the 
BamHI site in the well-known lambda-derived vector lambda DASH II (Stratagene, 
La Jolla, CA). The provision of the deposits is not a waiver of any rights of the 
inventors or their assignees in the present subject matter. 
5 The nucleotide sequences of the genomes from different strains of 

Streptococcus pneumoniae differ somewhat. However, the nucleotide sequences 
of the genomes of all Streptococcus pneumoniae strains will be at least 95% 
identical, in corresponding part, to the nucleotide sequences provided in SEQ ID 
NOS: 1-391. Nearly all will be at least 99% identical and the great majority will be 

10 99.9% identical. 

Thus, the present invention further provides nucleotide sequences which 
are at least 95%, preferably 99% and most preferably 99.9% identical to the 
nucleotide sequences of SEQ ID NOS: 1-391, in a form which can be readily used, 
analyzed and interpreted by the skilled artisan. 

15 Methods for determining whether a nucleotide sequence is at least 95%, at 

least 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID 
NOS: 1-391 are routine and readily available to the skilled artisan. For example, the 
well known fasta algorithm described in Pearson and Lipman, Proc. Natl Acad. 
ScL USA 85: 2444 (1988) can be used to generate the percent identity of nucleotide 

20 sequences. The BLASTN program also can be used to generate an identity score 
of polynucleotides compared to one another. 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS: 1-391, a representative 
25 fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% 
and most preferably at least 99.9% identical to a polynucleotide sequence of SEQ 
ID NOS: 1-391 may be "provided" in a variety of mediums to facilitate use thereof. 
As used herein, provided refers to a manufacture, other than an isolated nucleic 
acid molecule, which contains a nucleotide sequence of the present invention; i.e., 
30 a nucleotide sequence provided in SEQ ID NOS: 1-391, a representative fragment 
thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most 
preferably at least 99.9% identical to a polynucleotide of SEQ ID NOS: 1-391. 
Such a manufacture provides a large portion of the Streptococcus pneumoniae 
genome and parts thereof (e.g., a Streptococcus pneumoniae open reading frame 
35 (ORF)) in a form which allows a skilled artisan to examine the manufacture using 
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means not directly applicable to examining the Streptococcus pneumoniae genome 
or a subset thereof as it exists in nature or in purified form. 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
5 readable media" refers to any medium which can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media, 
such as floppy discs, hard disc storage medium, and magnetic tape; optical storage 
media such as CD- ROM; electrical storage media such as RAM and ROM; and 
hybrids of these categories, such as magnetic/optical storage media. A skilled 

10 artisan can readily appreciate how any of the presently known computer readable 
mediums can be used to create a manufacture comprising computer readable 
medium having recorded thereon a nucleotide sequence of the present invention. 
Likewise, it will be clear to those of skill how additional computer readable media 
that may be developed also can be used to create analogous manufactures having 

15 recorded thereon a nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the presently 
know methods for recording information on computer readable medium to generate 
manufactures comprising the nucleotide sequence information of the present 

20 invention. A variety of data storage structures are available to a skilled artisan 
for creating a computer readable medium having recorded thereon a nucleotide 
sequence of the present invention. The choice of the data storage structure will 
generally be based on the means chosen to access the stored information. In 
addition, a variety of data processor programs and formats can be used to store the 

25 nucleotide sequence information of the present invention on computer readable 
medium. The sequence information can be represented in a word processing text 
file, formatted in commercially- available software such as WordPerfect and 
Microsoft Word, or represented in the form of an ASCII file, stored in a database 
application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily 

30 adapt any number of data-processor structuring formats {e.g., text file or database) 
in order to obtain computer readable medium having recorded thereon the 
nucleotide sequence information of the present invention. 

Computer software is publicly available which allows a skilled artisan to 
access sequence information provided in a computer readable medium. Thus, by 

35 providing in computer readable form the nucleotide sequences of SEQ ED NOS:l- 
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391, a representative fragment thereof, or a nucleotide sequence at least 95%, 
preferably at least 99% and most preferably at least 99.9% identical to a sequence 
of SEQ ID NOS: 1-391 the present invention enables the skilled artisan routinely to 
access the provided sequence information for a wide variety of purposes. 
5 The examples which follow demonstrate how software which implements 

the BLAST (Altschul et al, J. Mol Biol 275:403-410 (1990)) and BLAZE 
(Brutlag etal, Comp. Chem. 77:203-207 (1993)) search algorithms on a Sybase 
system was used to identify open reading frames (ORFs) within the Streptococcus 
pneumoniae genome which contain homology to ORFs or proteins from both 

10 Streptococcus pneumoniae and from other organisms. Among the ORFs discussed 
herein are protein encoding fragments of the Streptococcus pneumoniae genome 
useful in producing commercially important proteins, such as enzymes used in 
fermentation reactions and in the production of commercially useful metabolites. 

The present invention further provides systems, particularly computer- 

15 based systems, which contain the sequence information described herein. Such 
systems are designed to identify, among other things, commercially important 
fragments of the Streptococcus pneumoniae genome. 

As used herein, "a computer-based system" refers to the hardware means, 
software means, and data storage means used to analyze the nucleotide sequence 

20 information of the present invention. The minimum hardware means of the 
computer-based systems of the present invention comprises a central processing 
unit (CPU), input means, output means, and data storage means. A skilled artisan 
can readily appreciate that any one of the currently available computer-based 
systems are suitable for use in the present invention. 

25 As stated above, the computer-based systems of the present invention 

comprise a data storage means having stored therein a nucleotide sequence of the 
present invention and the necessary hardware means and software means for 
supporting and implementing a search means. 

As used herein, "data storage means" refers to memory which can store 

30 nucleotide sequence information of the present invention, or a memory access 
means which can access manufactures having recorded thereon the nucleotide 
sequence information of the present invention. 

As used herein, "search means" refers to one or more programs which are 
implemented on the computer-based system to compare a target sequence or target 

35 structural motif with the sequence information stored within the data storage 
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means. Search means are used to identify fragments or regions of the present 
genomic sequences which match a particular target sequence or target motif. A 
variety of known algorithms are disclosed publicly and a variety of commercially 
available software for conducting search means are and can be used in the 
5 computer-based systems of the present invention. Examples of such software 
includes, but is not limited to, MacPattern (EMBL), BLASTN and BLASTX 
(NCBIA). A skilled artisan can readily recognize that any one of the available 
algorithms or implementing software packages for conducting homology searches 
can be adapted for use in the present computer-based systems. 

10 As used herein, a "target sequence" can be any DNA or amino acid 

sequence of six or more nucleotides or two or more amino acids. A skilled artisan 
can readily recognize that the longer a target sequence is, the less likely a target 
sequence will be present as a random occurrence in the database. The most 
preferred sequence length of a target sequence is from about 10 to 100 amino acids 

15 or from about 30 to 300 nucleotide residues. However, it is well recognized that 
searches for commercially important fragments, such as sequence fragments 
involved in gene expression and protein processing, may be of shorter length. 

As used herein, M a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) 

20 are chosen based on a three-dimensional configuration which is formed upon the 
folding of the target motif. There are a variety of target motifs known in the art. 
Protein target motifs include, but are not limited to, enzymic active sites and signal 
sequences. Nucleic acid target motifs include, but are not limited to, promoter 
sequences, hairpin structures and inducible expression elements (protein binding 

25 sequences). 

A variety of structural formats for the input and output means can be used 
to input and output the information in the computer-based systems of the present 
invention. A preferred format for an output means ranks fragments of the 
Streptococcus pneumoniae genomic sequences possessing varying degrees of 

30 homology to the target sequence or target motif. Such presentation provides a 
skilled artisan with a ranking of sequences which contain various amounts of the 
target sequence or target motif and identifies the degree of homology contained in 
the identified fragment. 

A variety of comparing means can be used to compare a target sequence or 

35 target motif with the data storage means to identify sequence fragments of the 
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Streptococcus pneumoniae genome. In the present examples, implementing 
software which implement the BLAST and BLAZE algorithms, described in 
Altschul et aL, J. Mol Biol. 215: 403-410 (1990), is used to identify open reading 
frames within the Streptococcus pneumoniae genome. A skilled artisan can readily 
5 recognize that any one of the publicly available homology search programs can be 
used as the search means for the computer-based systems of the present invention. 
Of course, suitable proprietary systems that may be known to those of skill also 
may be employed in this regard. 

Figure 1 provides a block diagram of a computer system illustrative of 

10 embodiments of this aspect of present invention. The computer system 102 
includes a processor 106 connected to a bus 104. Also connected to the bus 104 
are a main memory 108 (preferably implemented as random access memory, RAM) 
and a variety of secondary storage devices 110, such as a hard drive 112 and a 
removable medium storage device 1 14. The removable medium storage device 1 14 

15 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape 
drive, etc. A removable storage medium 116 (such as a floppy disk, a compact 
disk, a magnetic tape, etc.) containing control logic and/or data recorded therein 
may be inserted into the removable medium storage device 114. The computer 
system 102 includes appropriate software for reading the control logic and/or the 

20 data from the removable medium storage device 114, once it is inserted into the 
removable medium storage device 1 14. 

A nucleotide sequence of the present invention may be stored in a well 
known manner in the main memory 108, any of the secondary storage devices 1 10, 
and/or a removable storage medium 1 16. During execution, software for accessing 

25 and processing the genomic sequence (such as search tools, comparing tools, etc.) 
reside in main memory 108, in accordance with the requirements and operating 
parameters of the operating system, the hardware system and the software program 
or programs. 
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BIOCHEMICAL EMBODIMENTS 

Other embodiments of the present invention are directed to isolated 
fragments of the Streptococcus pneumoniae genome. The fragments of the 
5 Streptococcus pneumoniae genome of the present invention include, but are not 
limited to fragments which encode peptides and polypeptides, hereinafter open 
reading frames (ORFs), fragments which modulate the expression of an operably 
linked ORF, hereinafter expression modulating fragments (EMFs) and fragments 
which can be used to diagnose the presence of Streptococcus pneumoniae in a 

10 sample, hereinafter diagnostic fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment 
of the Streptococcus pneumoniae genome" refers to a nucleic acid molecule 
possessing a specific nucleotide sequence which has been subjected to purification 
means to reduce, from the composition, the number of compounds which are 

15 normally associated with the composition. Particularly, the term refers to the 
nucleic acid molecules having the sequences set out in SEQ ID NOS: 1-391, to 
representative fragments thereof as described above, to polynucleotides at least 
95%, preferably at least 99% and especially preferably at least 99.9% identical in 
sequence thereto, also as set out above. 

20 A variety of purification means can be used to generate the isolated 

fragments of the present invention. These include, but are not limited to methods 
which separate constituents of a solution based on charge, solubility, or size. 

In one embodiment. Streptococcus pneumoniae DNA can be enzymatically 
sheared to produce fragments of 15-20 kb in length. These fragments can then be 

25 used to generate a Streptococcus pneumoniae library by inserting them into lambda 
clones as described in the Examples below. Primers flanking, for example, an 
ORF, such as those enumerated in Tables 1-3 can then be generated using 
nucleotide sequence information provided in SEQ ID NOS: 1-391. Well known 
and routine techniques of PCR cloning then can be used to isolate the ORF from 

30 the lambda DNA library or Streptococcus pneumoniae genomic DNA. Thus, given 
the availability of SEQ ID NOS: 1-391, the information in Tables 1, 2 and 3, and 
the information that may be obtained readily by analysis of the sequences of SEQ 
ID NOS: 1-391 using methods set out above, those of skill will be enabled by the 
present disclosure to isolate any ORF-containing or other nucleic acid fragment of 

35 the present invention. 
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The isolated nucleic acid molecules of the present invention include, but are 
not limited to single stranded and double stranded DNA, and single stranded RNA. 

As used herein, an "open reading frame," ORF, means a series of triplets 
coding for amino acids without any termination codons and is a sequence 
5 translatable into protein. 

Tables 1, 2, and 3 list ORFs in the Streptococcus pneumoniae genomic 
contigs of the present invention that were identified as putative coding regions by 
the GeneMark software using organism-specific second-order Markov probability 
transition matrices. It will be appreciated that other criteria can be used, in 
10 accordance with well known analytical methods, such as those discussed herein, to 
generate more inclusive, more restrictive, or more selective lists. 

Table 1 sets out ORFs in the Streptococcus pneumoniae contigs of the 
present invention that over a continuous region of at least 50 bases are 95% or 
more identical (by BLAST analysis) to a nucleotide sequence available through 
15 GenBank in October, 1997. 

Table 2 sets out ORFs in the Streptococcus pneumoniae contigs of the 
present invention that are not in Table 1 and match, with a BLASTP probability 
score of 0.01 or less, a polypeptide sequence available through GenBank in 
October, 1997. 

20 Table 3 sets out ORFs in the Streptococcus pneumoniae contigs of the 

present invention that do not match significantly, by BLASTP analysis, a 
polypeptide sequence available through GenBank in October, 1997. 

In each table, the first and second columns identify the ORF by, 
respectively, contig number and ORF number within the contig; the third column 

25 indicates the first nucleotide of the ORF (actually the first nucleotide of the stop 
codon immediately preceeding the ORF), counting from the 5' end of the contig 
strand; and the fourth column, "stop (nt)" indicates the last nucleotide of the stop 
codon defining the 3 'end of the ORF. 

In Tables 1 and 2, column five, lists the Reference for the closest 

30 matching sequence available through GenBank. These reference numbers are the 
databases entry numbers commonly used by those of skill in the art, who will be 
familiar with their denominators. Descriptions of the nomenclature are available 
from the National Center for Biotechnology Information. Column six in Tables 1 
and 2 provides the gene name of the matching sequence; column seven provides 

35 the BLAST identity score and column eight the BLAST similarity score from the 
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comparison of the ORF and the homologous gene; and column nine indicates the 
length in nucleotides of the highest scoring segment pair identified by the BLAST 
identity analysis. 

Each ORF described in the tables is defined by "start (nt)" (5') and "stop 
5 (nt)" (3') nucleotide position numbers. These position numbers refer to the 
boundaries of each ORF and provide orientation with respect to whether the 
forward or reverse strand is the coding strand and which reading frame the coding 
sequence is contained. The "start" position is the first nucleotide of the triplet 
encoding a stop codon just 5' to the ORF and the "stop" position is the last 

10 nucleotide of the triplet encoding the next in-frame stop codon (i.e., the stop codon 
at the 3' end of the ORF). Those of ordinary skill in the art appreciate that 
preferred fragments within each ORF described in the table include fragments of 
each ORF which include the entire sequence from the delineated "start" and "stop" 
positions excepting the first and last three nucleotides since these encode stop 

15 codons. Thus, polynucleotides set out as ORFs in the tables but lacking the three 
(3) 5' nucleotides and the three (3) 3' nucleotides are encompassed by the present 
invention. Those of skill also appreciate that particularly preferred are fragments 
within each ORF that are polynucleotide fragments comprising polypeptide coding 
sequence. As defined herein, "coding sequence" includes the fragment within an 

20 ORF beginning at the first in-frame ATG (triplet encoding methionine) and ending 
with the last nucleotide prior to the triplet encoding the 3' stop codon. Preferred 
are fragments comprising the entire coding sequence and fragments comprising the 
entire coding sequence, excepting the coding sequence for the N-terminal 
methionine. Those of skill appreciate that the N-terminal methionine is often 

25 removed during post-translational processing and that polynucleotides lacking the 
ATG can be used to facilitate production of N-termainal fusion proteins which may 
be benefical in the production or use of genetically engineered proteins. Of course, 
due to the degeneracy of the genetic code many polynucleotides can encode a given 
polypeptide. Thus, the invention further includes polynucleotides comprising a 

30 nucleotide sequence encoding a polypeptide sequence itself encoded by the coding 
sequence within an ORF described in Tables 1-3 herein. Further, polynucleotides 
at least 95%, preferably at least 99% and especially preferably at least 99.9% 
identical in sequence to the foregoing polynucleotides, are contemplated by the 
present invention. 
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Polypeptides encoded by polynucleotides described above and elsewhere 
herein are also provided by the present invention as are polypeptide comprising a 
an amino acid sequence at least about 95%, preferably at least 97% and even more 
preferably 99% identical to the amino acid sequence of a polypeptide encoded by an 
5 ORF shown in Tables 1-3. These polypeptides may or may not comprise an N- 
terminal methionine. 

The concepts of percent identity and percent similarity of two polypeptide 
sequences is well understood in the art. For example, two polypeptides 10 amino 
acids in length which differ at three amino acid positions (e.g., at positions 1, 3 

10 and 5) are said to have a percent identity of 70%. However, the same two 
polypeptides would be deemed to have a percent similarity of 80% if, for example 
at position 5, the amino acids moieties, although not identical, were "similar" (i.e., 
possessed similar biochemical characteristics). Many programs for analysis of 
nucleotide or amino acid sequence similarity, such as fasta and BLAST specifically 

15 list percent identity of a matching region as an output parameter. Thus, for 
instance, Tables 1 and 2 herein enumerate the percent identity of the highest 
scoring segment pair in each ORF and its listed relative. Further details 
concerning the algorithms and criteria used for homology searches are provided 
below and are described in the pertinent literature highlighted by the citations 

20 provided below. 

It will be appreciated that other criteria can be used to generate more 
inclusive and more exclusive listings of the types set out in the tables. As those of 
skill will appreciate, narrow and broad searches both are useful. Thus, a skilled 
artisan can readily identify ORFs in contigs of the Streptococcus pneumoniae 

25 genome other than those listed in Tables 1-3, such as ORFs which are overlapping 
or encoded by the opposite strand of an identified ORF in addition to those 
ascertainable using the computer-based systems of the present invention. 

As used herein, an "expression modulating fragment," EMF, means a 
series of nucleotide molecules which modulates the expression of an operably 

30 linked ORF or EMF. 
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As used herein, a sequence is said to "modulate the expression of an 
operably linked sequence" when the expression of the sequence is altered by the 
presence of the EMF. EMFs include, but are not limited to, promoters, and 
promoter modulating sequences (inducible elements). One class of EMFs are 
5 fragments which induce the expression or an operably linked ORF in response to a 
specific regulatory factor or physiological event. 

EMF sequences can be identified within the contigs of the Streptococcus 
pneumoniae genome by their proximity to the ORFs provided in Tables 1-3. An 
intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 

10 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate 
the expression of an operably linked ORF in a fashion similar to that found with the 
naturally linked ORF sequence. As used herein, an "intergenic segment" refers to 
fragments of the Streptococcus pneumoniae genome which are between two 
ORF(s) herein described. EMFs also can be identified using known EMFs as a 

15 target sequence or target motif in the computer-based systems of the present 
invention. Further, the two methods can be combined and used together. 

The presence and activity of an EMF can be confirmed using an EMF trap 
vector. An EMF trap vector contains a cloning site linked to a marker sequence. A 
marker sequence encodes an identifiable phenotype, such as antibiotic resistance or 

20 a complementing nutrition auxotrophic factor, which can be identified or assayed 
when the EMF trap vector is placed within an appropriate host under appropriate 
conditions. As described above, a EMF will modulate the expression of an 
operably linked marker sequence. A more detailed discussion of various marker 
sequences is provided below. A sequence which is suspected as being an EMF is 

25 cloned in all three reading frames in one or more restriction sites upstream from the 
marker sequence in the EMF trap vector. The vector is then transformed into an 
appropriate host using known procedures and the phenotype of the transformed 
host in examined under appropriate conditions. As described above, an EMF will 
modulate the expression of an operably linked marker sequence. 

30 As used herein, a "diagnostic fragment," DF, means a series of nucleotide 

molecules which selectively hybridize to Streptococcus pneumoniae sequences. 
DFs can be readily identified by identifying unique sequences within contigs of the 
Streptococcus pneumoniae genome, such as by using well-known computer 
analysis software, and by generating and testing probes or amplification primers 
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consisting of the DF sequence in an appropriate diagnostic format which 
determines amplification or hybridization selectivity. 

The sequences falling within the scope of the present invention are not 
limited to the specific sequences herein described, but also include allelic and 
5 species variations thereof. Allelic and species variations can be routinely 
determined by comparing the sequences provided in SEQ ID NOS: 1-391, a 
representative fragment thereof, or a nucleotide sequence at least 95%, preferrably 
at least 99% and most at least preferably 99.9% identical to SEQ ID NOS: 1-391, 
with a sequence from another isolate of the same species. Furthermore, to 
10 accommodate codon variability, the invention includes nucleic acid molecules 
coding for the same amino acid sequences as do the specific ORFs disclosed 
herein. In other words, in the coding region of an ORF, substitution of one codon 
for another which encodes the same amino acid is expressly contemplated. Any 
specific sequence disclosed herein can be readily screened for errors by 

15 resequencing a particular fragment, such as an ORF, in both directions (i.e., 
sequence both strands). Alternatively, error screening can be performed by 
sequencing corresponding polynucleotides of Streptococcus pneumoniae origin 
isolated by using part or all of the fragments in question as a probe or primer. 

Preferred DFs of the present invention comprise at least about 17, 

20 preferrably at least about 20, and more preferrably at least about 50 contiguous 
nucleotides within an ORF set out in Tables 1-3. Most highly preferred DFs 
specifically hybridize to a polynucleotide containing the sequence of the ORF from 
which they are derived. Specific hybridization occurs even under stringent 
conditions defined elsewhere herein. 

25 Each of the ORFs of the Streptococcus pneumoniae genome disclosed in 

Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as 
polynucleotide reagents in numerous ways. For example, the sequences can be 
used as diagnostic probes or diagnostic amplification primers to detect the presence 
of a specific microbe in a sample, particularly Streptococcus pneumoniae. 

30 Especially preferred in this regard are ORFs such as those of Table 3, which do not 
match previously characterized sequences from other organisms and thus are most 
likely to be highly selective for Streptococcus pneumoniae. Also particularly 
preferred are ORFs that can be used to distinguish between strains of Streptococcus 
pneumoniae, particularly those that distinguish medically important strain, such as 

35 drug-resistant strains. 
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In addition, the fragments of the present invention, as broadly described, 
can be used to control gene expression through triple helix formation or antisense 
DNA or RNA, both of which methods are based on the binding of a polynucleotide 
sequence to DNA or RNA. Triple helix-formation optimally results in a shut-off of 
5 RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Information from the 
sequences of the present invention can be used to design antisense and triple helix- 
forming oligonucleotides. Polynucleotides suitable for use in these methods are 
usually 20 to 40 bases in length and are designed to be complementary to a region 
10 of the gene involved in transcription, for triple-helix formation, or to the mRNA 
itself, for antisense inhibition. Both techniques have been demonstrated to be 
effective in model systems, and the requisite techniques are well known and 
involve routine procedures. Triple helix techniques are discussed in, for example, 
Lee et al, Nucl. Acids Res, 6:3073 (1979); Cooney et al, Science 241:456 

15 (1988); and Dervan et aL, Science 257:1360 (1991). Antisense techniques in 
general are discussed in, for instance, Okano, J. Neurochem, 56:560 (1991) and 
Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL (1988)). 

The present invention further provides recombinant constructs comprising 

20 one or more fragments of the Streptococcus pneumoniae genomic fragments and 
contigs of the present invention. Certain preferred recombinant constructs of the 
present invention comprise a vector, such as a plasmid or viral vector, into which a 
fragment of the Streptococcus pneumoniae genome has been inserted, in a forward 
or reverse orientation. In the case of a vector comprising one of the ORFs of the 

25 present invention, the vector may further comprise regulatory sequences, including 
for example, a promoter, operably linked to the ORF. For vectors comprising the 
EMFs of the present invention, the vector may further comprise a marker sequence 
or heterologous ORF operably linked to the EMF. 

Large numbers of suitable vectors and promoters are known to those of 

30 skill in the art and are commercially available for generating the recombinant 
constructs of the present invention. The following vectors are provided by way of 
example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK, 
pBS KS, pNH8a, pNH16a, pNH18a, pNH46a (available from Stratagene); 
P Trc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from Pharmacia). 

35 Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG 
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(available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from 
Pharmacia). 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. 
5 Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metallothionein- I. Selection of the 
appropriate vector and promoter is well within the level of ordinary skill in the art. 
10 The present invention further provides host cells containing any one of the 

isolated fragments of the Streptococcus pneumoniae genomic fragments and 
contigs of the present invention, wherein the fragment has been introduced into the 
host cell using known methods. The host cell can be a higher eukaryotic host 
cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or 
1 5 a procaryotic cell, such as a bacterial cell. 

A polynucleotide of the present invention, such as a recombinant construct 
comprising an ORF of the present invention, may be introduced into the host by a 
variety of well established techniques that are standard in the art, such as calcium 
phosphate transfection, DEAE, dextran mediated transfection and electroporation, 
20 which are described in, for instance, Davis, L. et al, BASIC METHODS IN 
MOLECULAR BIOLOGY (1986). 

A host cell containing one of the fragments of the Streptococcus 
pneumoniae genomic fragments and contigs of the present invention, can be used 
in conventional manners to produce the gene product encoded by the isolated 
25 fragment (in the case of an ORF) or can be used to produce a heterologous protein 
under the control of the EMF. The present invention further provides 

isolated polypeptides encoded by the nucleic acid fragments of the present 
invention or by degenerate variants of the nucleic acid fragments of the present 
invention. By "degenerate variant" is intended nucleotide fragments which differ 
30 from a nucleic acid fragment of the present invention {e.g., an ORF) by nucleotide 
sequence but, due to the degeneracy of the Genetic Code, encode an identical 
polypeptide sequence. 

Preferred nucleic acid fragments of the present invention are the ORFs and 
subfragments thereof depicted in Tables 2 and 3 which encode proteins. 
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A variety of methodologies known in the art can be utilized to obtain any 
one of the isolated polypeptides or proteins of the present invention. At the 
simplest level, the amino acid sequence can be synthesized using commercially 
available peptide synthesizers. This is particularly useful in producing small 
5 peptides and fragments of larger polypeptides. Such short fragments as may be 
obtained most readily by synthesis are useful, for example, in generating antibodies 
against the native polypeptide, as discussed further below. 

In an alternative method, the polypeptide or protein is purified from 
bacterial cells which naturally produce the polypeptide or protein. One skilled in 

10 the art can readily employ well-known methods for isolating polypeptides and 
proteins to isolate and purify polypeptides or proteins of the present invention 
produced naturally by a bacterial strain, or by other methods. Methods for 
isolation and purification that can be employed in this regard include, but are not 
limited to, immunochromatography, HPLC, size-exclusion chromatography, ion- 

15 exchange chromatography, and immuno-affinity chromatography. 

The polypeptides and proteins of the present invention also can be purified 
from cells which have been altered to express the desired polypeptide or protein. 
As used herein, a cell is said to be altered to express a desired polypeptide or 
protein when the cell, through genetic manipulation, is made to produce a 

20 polypeptide or protein which it normally does not produce or which the cell 
normally produces at a lower level. Those skilled in the art can readily adapt 
procedures for introducing and expressing either recombinant or synthetic 
sequences into eukaryotic or prokaryotic cells in order to generate a cell which 
produces one of the polypeptides or proteins of the present invention. 

25 Any host/vector system can be used to express one or more of the ORFs of 

the present invention. These include, but are not limited to, eukaryotic hosts such 
as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host 
such as E. coli and B. subtilis. The most preferred cells are those which do not 
normally express the particular polypeptide or protein or which expresses the 

30 polypeptide or protein at low natural level. 



WO 98/18931 



24 



PCT/US97/19588 



"Recombinant," as used herein, means that a polypeptide or protein is 
derived from recombinant (e.g., microbial or mammalian) expression systems. 
"Microbial" refers to recombinant polypeptides or proteins made in bacterial or 
fungal (e.g., yeast) expression systems. As a product, "recombinant 
5 microbial M defines a polypeptide or protein essentially free of native endogenous 
substances and unaccompanied by associated native glycosylation. Polypeptides or 
proteins expressed in most bacterial cultures, e.g., E. coli, will be free of 
glycosylation modifications; polypeptides or proteins expressed in yeast will have a 
glycosylation pattern different from that expressed in mammalian cells. 

10 "Nucleotide sequence" refers to a heteropolymer of deoxy ribonucleotides. 

Generally, DNA segments encoding the polypeptides and proteins provided by this 
invention are assembled from fragments of the Streptococcus pneumoniae genome 
and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a 
synthetic gene which is capable of being expressed in a recombinant transcriptional 

15 unit comprising regulatory elements derived from a microbial or viral operon. 

Recombinant expression vehicle or vector" refers to a plasmid or phage or 
virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The 
expression vehicle can comprise a transcriptional unit comprising an assembly of 
(1) a genetic regulatory elements necessary for gene expression in the host, 

20 including elements required to initiate and maintain transcription at a level sufficient 
for suitable expression of the desired polypeptide, including, for example, 
promoters and, where necessary, an enhancer and a polyadenylation signal; (2) a 
structural or coding sequence which is transcribed into mRNA and translated into 
protein, and (3) appropriate signals to initiate translation at the beginning of the 

25 desired coding region and terminate translation at its end. Structural units intended 
for use in yeast or eukaryotic expression systems preferably include a leader 
sequence enabling extracellular secretion of translated protein by a host cell. 
Alternatively, where recombinant protein is expressed without a leader or transport 
sequence, it may include an N-terminal methionine residue. This residue may or 

30 may not be subsequently cleaved from the expressed recombinant protein to 
provide a final product. 

"Recombinant expression system" means host cells which have stably 
integrated a recombinant transcriptional unit into chromosomal DNA or carry the 
recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic 

35 or eukaryotic. Recombinant expression systems as defined herein will express 
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heterologous polypeptides or proteins upon induction of the regulatory elements 
linked to the DNA segment or synthetic gene to be expressed. 

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or 
other cells under the control of appropriate promoters. Cell- free translation 
5 systems can also be employed to produce such proteins using RNAs derived from 
the DNA constructs of the present invention. Appropriate cloning and expression 
vectors for use with prokaryotic and eukaryotic hosts are described in Sambrook et 
al., Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which 

10 is hereby incorporated by reference in its entirety. 

Generally, recombinant expression vectors will include origins of 
replication and selectable markers permitting transformation of the host cell, e.g., 
the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a 
promoter derived from a highly expressed gene to direct transcription of a 

15 downstream structural sequence. Such promoters can be derived from operons 
encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alpha- 
factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequences, and preferably, a leader sequence capable of directing 

20 secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N- 
terminal identification peptide imparting desired characteristics, e.g., stabilization 
or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a 

25 structural DNA sequence encoding a desired protein together with suitable 
translation initiation and termination signals in operable reading phase with a 
functional promoter. The vector will comprise one or more phenotypic selectable 
markers and an origin of replication to ensure maintenance of the vector and, when 
desirable, provide amplification within the host. 

30 Suitable prokaryotic hosts for transformation include strains of E. coli, B . 

subtilis, Salmonella typhimurium and various species within the genera 
Pseudomonas and Streptomyces. Others may, also be employed as a matter of 
choice. 

As a representative but non-limiting example, useful expression vectors for 
35 bacterial use can comprise a selectable marker and bacterial origin of replication 
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derived from commercially available plasmids comprising genetic elements of the 
well known cloning vector pBR322 (ATCC 37017). Such commercial vectors 
include, for example, pKK223-3 (available form Pharmacia Fine Chemicals, 
Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, WI, 
5 USA). These pBR322 "backbone" sections are combined with an appropriate 
promoter and the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host 
strain to an appropriate cell density, the selected promoter, where it is inducible, is 
derepressed or induced by appropriate means (e.g., temperature shift or chemical 

10 induction) and cells are cultured for an additional period to provide for expression 
of the induced gene product. Thereafter cells are typically harvested, generally by 
centrifugation, disrupted to release expressed protein, generally by physical or 
chemical means, and the resulting crude extract is retained for further purification. 

Various mammalian cell culture systems can also be employed to express 

15 recombinant protein. Examples of mammalian expression systems include the 
COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell 23:115 
(1981), and other cell lines capable of expressing a compatible vector, for example, 
the CI 27, 3T3, CHO, HeLa and BHK cell lines. 

Mammalian expression vectors will comprise an origin of replication, a 

20 suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived 
from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, 
splice, and polyadenylation sites may be used to provide the required 

25 nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is 
usually isolated by initial extraction from cell pellets, followed by one or more 
salting-out, aqueous ion exchange or size exclusion chromatography steps. 
Microbial cells employed in expression of proteins can be disrupted by any 

30 convenient method, including freeze-thaw cycling, sonication, mechanical 
disruption, or use of cell lysing agents. Protein refolding steps can be used, as 
necessary, in completing configuration of the mature protein. Finally, high 
performance liquid chromatography (HPLC) can be employed for final purification 
steps. 



WO 98/18931 



27 



PCT/US97/19588 



The present invention further includes isolated polypeptides, proteins and 
nucleic acid molecules which are substantially equivalent to those herein described. 
As used herein, substantially equivalent can refer both to nucleic acid and amino 
acid sequences, for example a mutant sequence, that varies from a reference 
5 sequence by one or more substitutions, deletions, or additions, the net effect of 
which does not result in an adverse functional dissimilarity between reference and 
subject sequences. For purposes of the present invention, sequences having 
equivalent biological activity, and equivalent expression characteristics are 
considered substantially equivalent. For purposes of determining equivalence, 

10 truncation of the mature sequence should be disregarded. 

The invention further provides methods of obtaining homologs from other 
strains of Streptococcus pneumoniae, of the fragments of the Streptococcus 
pneumoniae genome of the present invention and homologs of the proteins encoded 
by the ORFs of the present invention. As used herein, a sequence or protein of 

15 Streptococcus pneumoniae is defined as a homolog of a fragment of the 
Streptococcus pneumoniae fragments or contigs or a protein encoded by one of the 
ORFs of the present invention, if it shares significant homology to one of the 
fragments of the Streptococcus pneumoniae genome of the present invention or a 
protein encoded by one of the ORFs of the present invention. Specifically, by 

20 using the sequence disclosed herein as a probe or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain 
homologs. 

As used herein, two nucleic acid molecules or proteins are said to "share 
significant homology" if the two contain regions which possess greater than 85% 

25 sequence (amino acid or nucleic acid) homology. Preferred homologs in this 
regard are those with more than 90% homology. Especially preferred are those 
with 93% or more homology. Among especially preferred homologs those with 
95% or more homology are particularly preferred. Very particularly preferred 
among these are those with 97% and even more particularly preferred among those 

30 are homologs with 99% or more homology. The most preferred homologs among 
these are those with 99.9% homology or more. It will be understood that, among 
measures of homology, identity is particularly preferred in this regard. 

Region specific primers or probes derived from the nucleotide sequence 
provided in SEQ ID NOS: 1-391 or from a nucleotide sequence at least 95%, 

35 particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ 
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ID NOS: 1-391 can be used to prime DNA synthesis and PCR amplification, as 
well as to identify colonies containing cloned DNA encoding a homolog. Methods 
suitable to this aspect of the present invention are well known and have been 
described in great detail in many publications such as, for example, Innis et aL, 
5 PCR Protocols, Academic Press, San Diego, CA (1990)). 

When using primers derived from SEQ ID NOS: 1-391 or from a nucleotide 
sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-391, 
one skilled in the art will recognize that by employing high stringency conditions 
(e.g., annealing at 50-60°C in 6X SSPC and 50% formamide, and washing at 50- 

10 65°C in 0.5X SSPC) only sequences which are greater than 75% homologous to 
the primer will be amplified. By employing lower stringency conditions (e.g., 
hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C 
in 0.5X SSPC), sequences which are greater than 40-50% homologous to the 
primer will also be amplified. 

15 When using DNA probes derived from SEQ ID NOS: 1-391, or from a 

nucleotide sequence having an aforementioned identity to a sequence of SEQ ID 
NOS: 1-391, for colony/plaque hybridization, one skilled in the art will recognize 
that by employing high stringency conditions (e.g., hybridizing at 50- 65°C in 5X 
SSPC and 50% formamide, and washing at 50- 65°C in 0.5X SSPC), sequences 

20 having regions which are greater than 90% homologous to the probe can be 
obtained, and that by employing lower stringency conditions (e.g., hybridizing at 
35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C in 0.5X 
SSPC), sequences having regions which are greater than 35-45% homologous to 
the probe will be obtained. 

25 Any organism can be used as the source for homologs of the present 

invention so long as the organism naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs are 
bacteria which are closely related to Streptococcus pneumoniae. 

30 ILLUSTRATIVE USES OF COMPOSITIONS OF THE 

INVENTION 

Each ORF provided in Tables 1 and 2 is identified with a function by 
homology to a known gene or polypeptide. As a result, one skilled in the art can 
use the polypeptides of the present invention for commercial, therapeutic and 
35 industrial purposes consistent with the type of putative identification of the 
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polypeptide. Such identifications permit one skilled in the art to use the 
Streptococcus pneumoniae ORFs in a manner similar to the known type of 
sequences for which the identification is made; for example, to ferment a particular 
sugar source or to produce a particular metabolite. A variety of reviews illustrative 
5 of this aspect of the invention are available, including the following reviews on the 
industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND 
BIOTECHNOLOGY HANDBOOK, 2nd Ed., MacMillan Publications, Ltd. NY 
(1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et al, Eds., 
Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of 
10 exemplary uses that illustrate this and similar aspects of the present invention are 
discussed below. 

1. Biosynthetic Enzymes 

Open reading frames encoding proteins involved in mediating the catalytic 

15 reactions involved in intermediary and macromolecular metabolism, the 
biosynthesis of small molecules, cellular processes and other functions includes 
enzymes involved in the degradation of the intermediary products of metabolism, 
enzymes involved in central intermediary metabolism, enzymes involved in 
respiration, both aerobic and anaerobic, enzymes involved in fermentation, 

20 enzymes involved in ATP proton motor force conversion, enzymes involved in 
broad regulatory function, enzymes involved in amino acid synthesis, enzymes 
involved in nucleotide synthesis, enzymes involved in cofactor and vitamin 
synthesis, can be used for industrial biosynthesis. 

The various metabolic pathways present in Streptococcus pneumoniae can 

25 be identified based on absolute nutritional requirements as well as by examining the 
various enzymes identified in Table 1-3 and SEQ ID NOS: 1-391. 

Of particular interest are polypeptides involved in the degradation of 
intermediary metabolites as well as non-macromolecular metabolism. Such 
enzymes include amylases, glucose oxidases, and catalase. 

30 Proteolytic enzymes are another class of commercially important enzymes. 

Proteolytic enzymes find use in a number of industrial processes including the 
processing of flax and other vegetable fibers, in the extraction, clarification and 
depectinization of fruit juices, in the extraction of vegetables' oil and in the 
maceration of fruits and vegetables to give unicellular fruits. A detailed review of 

35 the proteolytic enzymes used in the food industry is provided in Rombouts et al., 
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Symbiosis 21:19 (1986) and Voragen et al in Biocatalysts In Agricultural 
Biotechnology, Whitaker et al, Eds., American Chemical Society Symposium 
Series 389:93 (1989) . 

The metabolism of sugars is an important aspect of the primary metabolism 
5 of Streptococcus pneumoniae. Enzymes involved in the degradation of sugars, 
such as, particularly, glucose, galactose, fructose and xylose, can be used in 
industrial fermentation. Some of the important sugar transforming enzymes, from 
a commercial viewpoint, include sugar isomerases such as glucose isomerase. 
Other metabolic enzymes have found commercial use such as glucose oxidases 

10 which produces ketogulonic acid (KG A). KGA is an intermediate in the 
commercial production of ascorbic acid using the Reichstein's procedure, as 
described in Krueger et al, Biotechnology 6(A) . Rhine et al, Eds., Verlag Press, 
Weinheim, Germany (1984). 

Glucose oxidase (GOD) is commercially available and has been used in 

1 5 purified form as well as in an immobilized form for the deoxygenation of beer. 
See, for instance, Hartmeir et al, Biotechnology Letters 1:21 (1979). The most 
important application of GOD is the industrial scale fermentation of gluconic acid. 
Market for gluconic acids which are used in the detergent, textile, leather, 
photographic, pharmaceutical, food, feed and concrete industry, as described, for 

20 example, in Bigelis et al, beginning on page 357 in GENE MANIPULATIONS 
AND FUNGI; Benett et al t Eds., Academic Press, New York (1985). In addition 
to industrial applications, GOD has found applications in medicine for quantitative 
determination of glucose in body fluids recently in biotechnology for analyzing 
syrups from starch and cellulose hydrosylates. This application is described in 

25 Owusu et al, Biochem. et Biophysica. Acta, 872:83 (1986), for instance. 

The main sweetener used in the world today is sugar which comes from 
sugar beets and sugar cane. In the field of industrial enzymes, the glucose 
isomerase process shows the largest expansion in the market today. Initially, 
soluble enzymes were used and later immobilized enzymes were developed 

30 (Krueger et al, Biotechnology, The Textbook of Industrial Microbiology, Sinauer 
Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of 
glucose- produced high fructose syrups is by far the largest industrial business 
using immobilized enzymes. A review of the industrial use of these enzymes is 
provided by Jorgensen, Starch 40:301 (1988). 
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Proteinases, such as alkaline serine proteinases, are used as detergent 
additives and thus represent one of the largest volumes of microbial enzymes used 
in the industrial sector. Because of their industrial importance, there is a large body 
of published and unpublished information regarding the use of these enzymes in 
5 industrial processes. (See Faultman et aL, Acid Proteases Structure Function and 
Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey et aL, 
Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et aL, 
Report Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)). 

Another class of commercially usable proteins of the present invention are 

10 the microbial lipases, described by, for instance, Macrae et aL, Philosophical 
Transactions of the Chiral Society of London 310:227 (1985) and Poserke, Journal 
of the American Oil Chemist Society 67:1758 (1984). A major use of lipases is in 
the fat and oil industry for the production of neutral glycerides using lipase 
catalyzed inter-esterification of readily available triglycerides. Application of 

15 lipases include the use as a detergent additive to facilitate the removal of fats from 
fabrics in the course of the washing procedures. 

The use of enzymes, and in particular microbial enzymes, as catalyst for 
key steps in the synthesis of complex organic molecules is gaining popularity at a 
great rate. One area of great interest is the preparation of chiral intermediates. 

20 Preparation of chiral intermediates is of interest to a wide range of synthetic 
chemists particularly those scientists involved with the preparation of new 
pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et aL, Recent 
Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, 
Boca Raton, Florida (1990)). The following reactions catalyzed by enzymes are of 

25 interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, 
amides and nitriles, esterification reactions, trans-esterification reactions, synthesis 
of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to 
carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming 
reactions such as the aldol reaction. 

30 When considering the use of an enzyme encoded by one of the ORFs of the 

present invention for biotransformation and organic synthesis it is sometimes 
necessary to consider the respective advantages and disadvantages of using a 
microorganism as opposed to an isolated enzyme. Pros and cons of using a whole 
cell system on the one hand or an isolated partially purified enzyme on the other 
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hand, has been described in detail by Bud et aL, Chemistry in Britain (1987), p. 
127. 

Amino transferases, enzymes involved in the biosynthesis and metabolism 
of amino acids, are useful in the catalytic production of amino acids. The 

5 advantages of using microbial based enzyme systems is that the amino transferase 
enzymes catalyze the stereo- selective synthesis of only L-amino acids and 
generally possess uniformly high catalytic rates. A description of the use of amino 
transferases for amino acid production is provided by Roselle-David, Methods of 
Enzymology 136:479 (1987). 

1 0 Another category of useful proteins encoded by the ORFs of the present 

invention include enzymes involved in nucleic acid synthesis, repair, and 
recombination. 

2. Generation of Antibodies 

15 As described here, the proteins of the present invention, as well as 

homologs thereof, can be used in a variety of procedures and methods known in 
the art which are currently applied to other proteins. The proteins of the present 
invention can further be used to generate an antibody which selectively binds the 
protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well 

20 fragments of these antibodies, and humanized forms. 

The invention further provides antibodies which selectively bind to one of 
the proteins of the present invention and hybridomas which produce these 
antibodies. A hybridoma is an immortalized cell line which is capable of secreting 
a specific monoclonal antibody. 

25 In general, techniques for preparing polyclonal and monoclonal antibodies 

as well as hybridomas capable of producing the desired antibody are well known in 
the art (Campbell, A. ML, Monoclonal Antibody Technology: Laboratory 
Techniques In Biochemistry And Molecular Biology, Elsevier Science Publishers, 
Amsterdam, The Netherlands (1984); St. Groth et al, 7. Immunol Methods 35: 1- 

30 21 (1980), Kohler and Milstein, Nature 256:495-497 (1975)), the trioma 
technique, the human B-cell hybridoma technique (Kozbor et al, Immunology 
Today 4:72 (1983), pgs. 77-96 of Cole et al, in Monoclonal Antibodies And 
Cancer Therapy, Alan R. Liss, Inc. (1985)). Any animal (mouse, rabbit, 

etc.) which is known to produce antibodies can be immunized with the pseudogene 

35 polypeptide. Methods for immunization are well known in the art. Such methods 
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include subcutaneous or interperitoneal injection of the polypeptide. One skilled in 
the art will recognize that the amount of the protein encoded by the ORF of the 
present invention used for immunization will vary based on the animal which is 
immunized, the antigenicity of the peptide and the site of injection. 
5 The protein which is used as an immunogen may be modified or 

administered in an adjuvant in order to increase the protein's antigenicity. Methods 
of increasing the antigenicity of a protein are well known in the art and include, but 
are not limited to coupling the antigen with a heterologous protein (such as globulin 
or galactosidase) or through the inclusion of an adjuvant during immunization. 

10 For monoclonal antibodies, spleen cells from the immunized animals are 

removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and 
allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known in the art can be used to 
identify the hybridoma cell which produces an antibody with the desired 

15 characteristics. These include screening the hybridomas with an ELISA assay, 
western blot analysis, or radioimmunoassay (Lutz et al, Exp. Cell Res. 175:109- 
124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and 
subclass is determined using procedures known in the art (Campbell, A. M., 
20 Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1984)). 

Techniques described for the production of single chain antibodies (U. S . 
Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of 
25 the present invention. 

For polyclonal antibodies, antibody containing antisera is isolated from the 
immunized animal and is screened for the presence of antibodies with the desired 
specificity using one of the above-described procedures. 

The present invention further provides the above- described antibodies in 
30 detectably labelled form. Antibodies can be detectably labelled through the use of 
radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such 
as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as 
FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing 
such labeling are well-known in the art, for example see Sternberger et al, J. 
35 Histochem. Cytochem. 75:315 (1970); Bayer, E. A. et aL 9 Meth. Enzym. 62:308 
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(1979); Engval, E. et al, Immunol 109:129 (1972); Goding, J. W., J. Immunol. 
Meth. 75:215 (1976)). 

The labeled antibodies of the present invention can be used for in vitro, in 
vivo, and in situ assays to identify cells or tissues in which a fragment of the 
5 Streptococcus pneumoniae genome is expressed. 

The present invention further provides the above-described antibodies 
immobilized on a solid support. Examples of such solid supports include plastics 
such as polycarbonate, complex carbohydrates such as agarose and sepharose, 
acrylic resins and such as polyacrylamide and latex beads. Techniques for 
10 coupling antibodies to such solid supports are well known in the art (Weir, D. M. 
et al, "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al, Meth. 
Enzym. 34 Academic Press, N. Y. (1974)). The immobilized antibodies of the 
present invention can be used for in vitro, in vivo, and in situ assays as well as for 
15 immunoaffinity purification of the proteins of the present invention. 

3. Diagnostic Assays and Kits 

The present invention further provides methods to identify the expression 
of one of the ORFs of the present invention, or homolog thereof, in a test sample, 

20 using one of the DFs or antibodies of the present invention. 

In detail, such methods comprise incubating a test sample with one or more 
of the antibodies or one or more of the DFs of the present invention and assaying 
for binding of the DFs or antibodies to components within the test sample. 

Conditions for incubating a DF or antibody with a test sample vary. 

25 Incubation conditions depend on the format employed in the assay, the detection 
methods employed, and the type and nature of the DF or antibody used in the 
assay. One skilled in the art will recognize that any one of the commonly available 
hybridization, amplification or immunological assay formats can readily be adapted 
to employ the DFs or antibodies of the present invention. Examples of such assays 

30 can be found in Chard, T., An Introduction to Radioimmunoassay and Related 
Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); 
Bullock, G. R. et al, Techniques in Immunocytochemistry, Academic Press, 
Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and 
Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and 
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Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1985). 

The test samples of the present invention include cells, protein or membrane 
extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or 
5 urine. The test sample used in the above-described method will vary based on the 
assay format, nature of the detection method and the tissues, cells or extracts used 
as the sample to be assayed. Methods for preparing protein extracts or membrane 
extracts of cells are well known in the art and can be readily be adapted in order to 
obtain a sample which is compatible with the system utilized. 

10 In another embodiment of the present invention, kits are provided which 

contain the necessary reagents to carry out the assays of the present invention. 

Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the DFs or antibodies of the present invention; and (b) one or 

15 more other containers comprising one or more of the following: wash reagents, 
reagents capable of detecting presence of a bound DF or antibody. 

In detail, a compartmentalized kit includes any kit in which reagents are 
contained in separate containers. Such containers include small glass containers, 
plastic containers or strips of plastic or paper. Such containers allows one to 

20 efficiently transfer reagents from one compartment to another compartment such 
that the samples and reagents are not cross-contaminated, and the agents or 
solutions of each container can be added in a quantitative fashion from one 
compartment to another. Such containers will include a container which will accept 
the test sample, a container which contains the antibodies used in the assay, 

25 containers which contain wash reagents (such as phosphate buffered saline, Tris- 
buffers, etc.), and containers which contain the reagents used to detect the bound 
antibody or DF. 

Types of detection reagents include labelled nucleic acid probes, labelled 
secondary antibodies, or in the alternative, if the primary antibody is labelled, the 
30 enzymatic, or antibody binding reagents which are capable of reacting with the 
labelled antibody. One skilled in the art will readily recognize that the disclosed 
DFs and antibodies of the present invention can be readily incorporated into one of 
the established kit formats which are well known in the art. 



35 
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Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents which bind to a 
protein encoded by one of the ORFs of the present invention or to one of the 
fragments and the Streptococcus pneumoniae fragment and contigs herein 
5 described. 

In general, such methods comprise steps of: 

(a) contacting an agent with an isolated protein encoded by one of the 
ORFs of the present invention, or an isolated fragment of the Streptococcus 
pneumoniae genome; and 

10 (b) determining whether the agent binds to said protein or said fragment. 

The agents screened in the above assay can be, but are not limited to, 
peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The 
agents can be selected and screened at random or rationally selected or designed 
using protein modeling techniques. 

15 F °r random screening, agents such as peptides, carbohydrates, 

pharmaceutical agents and the like are selected at random and are assayed for their 
ability to bind to the protein encoded by the ORF of the present invention. 

Alternatively, agents may be rationally selected or designed. As used 
herein, an agent is said to be "rationally selected or designed" when the agent is 

20 chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate 
peptides, pharmaceutical agents and the like capable of binding to a specific peptide 
sequence in order to generate rationally designed antipeptide peptides, for example 
see Hurby et al. y "Application of Synthetic Peptides: Antisense Peptides," in 

25 Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, 
and Kaspczak etaL, Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or 
the like. 

In addition to the foregoing, one class of agents of the present invention, as 
broadly described, can be used to control gene expression through binding to one 
30 of the ORFs or EMFs of the present invention. As described above, such agents 
can be randomly screened or rationally designed/selected. Targeting the ORF or 
EMF allows a skilled artisan to design sequence specific or element specific agents, 
modulating the expression of either a single ORF or multiple ORFs which rely on 
the same EMF for expression control. 
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One class of DNA binding agents are agents which contain base residues 
which hybridize or form a triple helix by binding to DNA or RNA. Such agents 
can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a 
variety of sulfhydryl or polymeric derivatives which have base attachment capacity. 
5 Agents suitable for use in these methods usually contain 20 to 40 bases and 

are designed to be complementary to a region of the gene involved in transcription 
(triple helix - see Lee et al, NucL Acids Res. 6:3073 (1979); Cooney et aL, 
Science 241:456 (1988); and Dervan et al, Science 257:1360 (1991)) or to the 
mRNA itself (antisense - Okano, 7. Neurochem. 56:560 (1991); 

10 Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of 
RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 

15 sequences of the present invention can be used to design antisense and triple helix- 
forming oligonucleotides, and other DNA binding agents. 

5. Pharmaceutical Compositions and Vaccines 

The present invention further provides pharmaceutical agents which can be 

20 used to modulate the growth or pathogenicity of Streptococcus pneumoniae, or 
another related organism, in vivo or in vitro. As used herein, a "pharmaceutical 
agent" is defined as a composition of matter which can be formulated using known 
techniques to provide a pharmaceutical compositions. As used herein, the 
"pharmaceutical agents of the present invention" refers the pharmaceutical agents 

25 which are derived from the proteins encoded by the ORFs of the present invention 
or are agents which are identified using the herein described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth 
pathogenicity of Streptococcus pneumoniae or a related organism, in vivo or in 
vitro" when the agent reduces the rate of growth, rate of division, or viability of 

30 the organism in question. The pharmaceutical agents of the present invention can 
modulate the growth or pathogenicity of an organism in many fashions, although 
an understanding of the underlying mechanism of action is not needed to practice 
the use of the pharmaceutical agents of the present invention. Some agents will 
modulate the growth by binding to an important protein thus blocking the biological 

35 activity of the protein, while other agents may bind to a component of the outer 
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surface of the organism blocking attachment or rendering the organism more prone 
to act the bodies nature immune system. Alternatively, the agent may comprise a 
protein encoded by one of the ORFs of the present invention and serve as a 
vaccine. The development and use of a vaccine based on outer membrane 
5 components are well known in the art. 

As used herein, a "related organism" is a broad term which refers to any 
organism whose growth can be modulated by one of the pharmaceutical agents of 
the present invention. In general, such an organism will contain a homolog of the 
protein which is the target of the pharmaceutical agent or the protein used as a 
10 vaccine. As such, related organisms do not need to be bacterial but may be fungal 
or viral pathogens. 

The pharmaceutical agents and compositions of the present invention may 
be administered in a convenient manner, such as by the oral, topical, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 

15 pharmaceutical compositions are administered in an amount which is effective for 
treating and/or prophylaxis of the specific indication. In general, they are 
administered in an amount of at least about 1 mg/kg body weight and in most cases 
they will be administered in an amount not in excess of about 1 g/kg body weight 
per day. In most cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body 

20 weight daily, taking into account the routes of administration, symptoms, etc. 

The agents of the present invention can be used in native form or can be 
modified to form a chemical derivative. As used herein, a molecule is said to be a 
"chemical derivative" of another molecule when it contains additional chemical 
moieties not normally a part of the molecule. Such moieties may improve the 

25 molecule's solubility, absorption, biological half life, etc. The moieties may 
alternatively decrease the toxicity of the molecule, eliminate or attenuate any 
undesirable side effect of the molecule, etc. Moieties capable of mediating such 
effects are disclosed in, among other sources, REMINGTON'S 
PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein. 

30 For example, such moieties may change an immunological character of the 

functional derivative, such as affinity for a given antibody. Such changes in 
immunomodulation activity are measured by the appropriate assay, such as a 
competitive type immunoassay. Modifications of such protein properties as redox 
or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic 

35 degradation or the tendency to aggregate with carriers or into multimers also may 
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be effected in this way and can be assayed by methods well known to the skilled 
artisan. 

The therapeutic effects of the agents of the present invention may be 
obtained by providing the agent to a patient by any suitable means (e.g., inhalation, 
5 intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is 
preferred to administer the agent of the present invention so as to achieve an 
effective concentration within the blood or tissue in which the growth of the 
organism is to be controlled. To achieve an effective blood concentration, the 
preferred method is to administer the agent by injection. The administration may be 

10 by continuous infusion, or by single or multiple injections. 

In providing a patient with one of the agents of the present invention, the 
dosage of the administered agent will vary depending upon such factors as the 
patient's age, weight, height, sex, general medical condition, previous medical 
history, etc. In general, it is desirable to provide the recipient with a dosage of 

15 agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of 
patient), although a lower or higher dosage may be administered. The 
therapeutically effective dose can be lowered by using combinations of the agents 
of the present invention or another agent. 

As used herein, two or more compounds or agents are said to be 

20 administered "in combination" with each other when either (1) the physiological 
effects of each compound, or (2) the serum concentrations of each compound can 
be measured at the same time. The composition of the present invention can be 
administered concurrently with, prior to, or following the administration of the 
other agent. 

25 The agents of the present invention are intended to be provided to recipient 

subjects in an amount sufficient to decrease the rate of growth (as defined above) of 
the target organism. 

The administration of the agent(s) of the invention may be for either a 
"prophylactic" or "therapeutic" purpose. When provided prophylactically, the 

30 agent(s) are provided in advance of any symptoms indicative of the organisms 
growth. The prophylactic administration of the agent(s) serves to prevent, 
attenuate, or decrease the rate of onset of any subsequent infection. When 
provided therapeutically, the agent(s) are provided at (or shortly after) the onset of 
an indication of infection. The therapeutic administration of the compound(s) 
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serves to attenuate the pathological symptoms of the infection and to increase the 
rate of recovery. 

The agents of the present invention are administered to a subject, such as a 
mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically 
5 effective concentration. A composition is said to be "pharmacologically acceptable" 
if its administration can be tolerated by a recipient patient. Such an agent is said to 
be administered in a "therapeutically effective amount" if the amount administered 
is physiologically significant. An agent is physiologically significant if its presence 
results in a detectable change in the physiology of a recipient patient. 
10 The agents of the present invention can be formulated according to known 

methods to prepare pharmaceutically useful compositions, whereby these materials, 
or their functional derivatives, are combined in a mixture with a pharmaceutically 
acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of 
other human proteins, e.g., human serum albumin, are described, for example, in 
15 REMINGTON'S PHARMACEUTICAL SCIENCES, 16th Ed., Osol, A., Ed., 
Mack Publishing, Easton PA (1980). In order to form a pharmaceutically 
acceptable composition suitable for effective administration, such compositions will 
contain an effective amount of one or more of the agents of the present invention, 
together with a suitable amount of carrier vehicle. 

20 Additional pharmaceutical methods may be employed to control the duration 

of action. Control release preparations may be achieved through the use of 
polymers to complex or absorb one or more of the agents of the present invention. 
The controlled delivery may be effectuated by a variety of well known techniques, 
including formulation with macromolecules such as, for example, polyesters, 

25 polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, 
carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the 
macromolecules and the agent in the formulation, and by appropriate use of 
methods of incorporation, which can be manipulated to effectuate a desired time 
course of release. Another possible method to control the duration of action by 

30 controlled release preparations is to incorporate agents of the present invention into 
particles of a polymeric material such as polyesters, polyamino acids, hydrogels, 
poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of 
incorporating these agents into polymeric particles, it is possible to entrap these 
materials in microcapsules prepared, for example, by coacervation techniques or by 

35 interfacial polymerization with, for example, hydroxymethylcellulose or gelatine- 
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microcapsules and poly(methylmethacylate) microcapsules, respectively, or in 
colloidal drug delivery systems, for example, liposomes, albumin microspheres, 
microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such 
techniques are disclosed in REMINGTON'S PHARMACEUTICAL SCIENCES 
5 (1980). 

The invention further provides a pharmaceutical pack or kit comprising one 
or more containers filled with one or more of the ingredients of the pharmaceutical 
compositions of the invention. Associated with such container(s) can be a notice in 
the form prescribed by a governmental agency regulating the manufacture, use or 
10 sale of pharmaceuticals or biological products, which notice reflects approval by 
the agency of manufacture, use or sale for human administration. 

In addition, the agents of the present invention may be employed in 
conjunction with other therapeutic compounds. 

6. Shot-Gun Approach to Megabase DNA Sequencing 

The present invention further demonstrates that a large sequence can be 
sequenced using a random shotgun approach. This procedure, described in detail 
in the examples that follow, has eliminated the up front cost of isolating and 
ordering overlapping or contiguous subclones prior to the start of the sequencing 
protocols. 

Certain aspects of the present invention are described in greater detail in the 
examples that follow. The examples are provided by way of illustration. Other 
aspects and embodiments of the present invention are contemplated by the 
inventors, as will be clear to those of skill in the art from reading the present 
disclosure. 

ILLUSTRATIVE EXAMPLES 

LIBRARIES AND SEQUENCING 
1. Shotgun Sequencing Probability Analysis 

The overall strategy for a shotgun approach to whole genome sequencing 
follows from the Lander and Waterman (Landerman and Waterman, Genomics 
2:231 (1988)) application of the equation for the Poisson distribution. According 
to this treatment, the probability, P , that any given base in a sequence of size L, in 
nucleotides, is not sequenced after a certain amount, n, in nucleotides, of random 
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sequence has been determined can be calculated by the equation P = e~ m , where m 
is L/n, the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 
Mb of sequence has been randomly generated (IX coverage). A?that point, P = 
e" 1 = 0.37. The probability that any given base has not been sequenced is the same 
5 as the probability that any region of the whole sequence L has not been determined 
and, therefore, is equivalent to the fraction of the whole sequence that has yet to be 
determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of 
size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been 
generated, coverage is 5X for a 2.8 Mb and the unsequenced fraction drops to 

10 .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing 
approximately 17,000 random clones from both insert ends with an average 
sequence read length of 410 bp. 

Similarly, the total gap length, G, is determined by the equation G = Le~ m , 
and the average gap size, g, follows the equation, g = L/n. Thus, 5X coverage 

15 leaves about 240 gaps averaging about 82 bp in size in a sequence of a 
polynucleotide 2.8 Mb long. 

The treatment above is essentially that of Lander and Waterman, Genomics 
2: 231 (1988). 

20 2. Random Library Construction 

In order to approximate the random model described above during actual 
sequencing, a nearly ideal library of cloned genomic fragments is required. The 
following library construction procedure was developed to achieve this end. 

Streptococcus pneumoniae DNA is prepared by phenol extraction. A 

25 mixture containing 200 |LLg DNA in 1 .0 ml of 300 mM sodium acetate, 10 mM Tris- 
HC1, 1 mM Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical 
Products) with a stream of nitrogen adjusted to 35 Kpa for 2 minutes. The 
sonicated DNA is ethanol precipitated and redissolved in 500 ]il TE buffer. 

To create blunt-ends, a 100 jllI aliquot of the resuspended DNA is digested 

30 with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 30°C in 200 
|il BAL31 buffer. The digested DNA is phenol-extracted, ethanol-precipitated, 
redissolved in 100 ^1 TE buffer, and then size-fractionated by electrophoresis 
through a 1.0% low melting temperature agarose gel. The section containing DNA 
fragments 1.6-2.0 kb in size is excised from the gel, and the LGT agarose is melted 

35 and the resulting solution is extracted with phenol to separate the agarose from the 
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DNA. DNA is ethanol precipitated and redissolved in 20 |il of TE buffer for 
ligation to vector. 

A two-step ligation procedure is used to produce a plasmid library with 
97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) 
5 contains 2 \±g of DNA fragments, 2 \ig pUC18 DNA (Pharmacia) cut with Smal 
and dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase 
(GIBCO/BRL) and is incubated at 14°C for 4 hr. The ligation mixture then is 
phenol extracted and ethanol precipitated, and the precipitated DNA is dissolved in 
20 |il TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete 

10 bands in a ladder are visualized by ethidium bromide-staining and UV illumination 
and identified by size as insert (I), vector (v), v+I, v+2i, v+3i, etc. The portion of 
the gel containing v+I DNA is excised and the v+I DNA is recovered and 
resuspended into 20 |il TE. The v+I DNA then is blunt-ended by T4 polymerase 
treatment for 5 min. at 37°C in a reaction mixture (50 ul) containing the v+I linears, 

15 500 |LlM each of the 4 dNTPs, and 9 units of T4 polymerase (New England 
BioLabs), under recommended buffer conditions. After phenol extraction and 
ethanol precipitation the repaired v+I linears are dissolved in 20 |ll1 TE. The final 
ligation to produce circles is carried out in a 50 |il reaction containing 5 jul of v+I 
linears and 5 units of T4 ligase at 14°C overnight. After 10 min. at 70°C the 

20 following day, the reaction mixture is stored at -20°C. 

This two-stage procedure results in a molecularly random collection of 
single-insert plasmid recombinants with minimal contamination from double-insert 
chimeras (<1%) or free vector (<3%). 

Since deviation from randomness can arise from propagation the DNA in 

25 the host, E. coli host cells deficient in all recombination and restriction functions 
(A. Greener, Strategies 3 (1):5 (1990)) are used to prevent rearrangements, 
deletions, and loss of clones by restriction. Furthermore, transformed cells are 
plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase 
which allows multiplication and selection of the most rapidly growing cells. 

30 Plating is carried out as follows. A 100 jllI aliquot of Epicurian Coli SURE 

II Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a 
chilled Falcon 2059 tube on ice. A 1 .7 jllI aliquot of 1 .42 M beta-mercaptoethanol 
is added to the aliquot of cells to a final concentration of 25 mM. Cells are 
incubated on ice for 10 min. A 1 |il aliquot of the final ligation is added to the cells 

35 and incubated on ice for 30 min. The cells are heat pulsed for 30 sec. at 42°C and 
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placed back on ice for 2 min. The outgrowth period in liquid culture is eliminated 
from this protocol in order to minimize the preferential growth of any given 
transformed cell. Instead the transformation mixture is plated directly on a nutrient 
rich SOB plate containing a 5 ml bottom layer of SOB agar (5% SOB agar: 20 g 
5 tryptone, 5 g yeast extract, 0.5 g NaCl, 1 .5% Difco Agar per liter of media). The 5 
ml bottom layer is supplemented with 0.4 ml of 50 mg/ml ampicillin per 100 ml 
SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal 
(2%), 1 ml MgCl (1 M), and 1 ml MgSO /100 ml SOB agar. The 15 mi top layer 
is poured just prior to plating. Our titer is approximately 100 colonies/10 \xl aliquot 
10 of transformation? ^ 

All colonies are picked for template preparation regardless of size. Thus, 
only clones lost due to "poison" DNA or deleterious gene products are deleted from 
the library, resulting in a slight increase in gap number over that expected. 

15 3. Random DNA Sequencing 

High quality double stranded DNA plasmid templates are prepared using a 
"boiling bead" method developed in collaboration with Advanced Genetic 
Technology Corp. (Gaithersburg, MD) (Adams et al, Science 252:1651 (1991); 
Adams et al, Nature 355:632 (1992)). Plasmid preparation is performed in a 96- 

20 well format for all stages of DNA preparation from bacterial growth through final 
DNA purification. Template concentration is determined using Hoechst Dye and a 
Millipore Cytofluor. DNA concentrations are not adjusted, but low-yielding 
templates are identified where possible and not sequenced. 

Templates are also prepared from two Streptococcus pneumoniae lambda 

25 genomic libraries. An amplified library is constructed in the vector Lambda GEM- 
12 (Promega) and an unamplified library is constructed in Lambda DASH II 
(Stratagene). In particular, for the unamplified lambda library, Streptococcus 
pneumoniae DNA (> 100 kb) is partially digested in a reaction mixture (200 ul) 
containing 50 \ig DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23°C. 

30 The digested DNA was phenol-extracted and electrophoresed on a 0.5% low 
melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb are excised 
and recovered in a final volume of 6 ul. One (il of fragments is used with 1 |al of 
DASHII vector (Stratagene) in the recommended ligation reaction. One |ll1 of the 
ligation mixture is used per packaging reaction following the recommended 

35 protocol with the Gigapack II XL Packaging Extract (Stratagene, #22771 1). Phage 
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are plated directly without amplification from the packaging mixture (after dilution 
with 500 \il of recommended SM buffer and chloroform treatment). Yield is about 
2.5x1 0 3 pfu/ul. The amplified library is prepared essentially as above except the 
lambda GEM- 12 vector is used. After packaging, about 3.5xl0 4 pfu are plated on 
5 the restrictive NM539 host. The lysate is harvested in 2 ml of SM buffer and 
stored frozen in 7% dimethylsulfoxide. The phage titer is approximately IxlO 9 
pfu/ml. 

Liquid lysates (100 |J.l) are prepared from randomly selected plaques (from 
the unamplified library) and template is prepared by long-range PCR using T7 and 
10 T3 vector-specific primers. 

Sequencing reactions are carried out on plasmid and/or PCR templates 
using the AB Catalyst LabStation with Applied Biosystems PRISM Ready 
Reaction Dye Primer Cycle Sequencing Kits for the M 13 forward (Ml 3-21) and 
the M13 reverse (M13RP1) primers (Adams et al, Nature 368:474 (1994)). Dye 

15 terminator sequencing reactions are carried out on the lambda templates on a 
Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready Reaction 
Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence 
the ends of the inserts from the Lambda GEM- 12 library and T7 and T3 primers are 
used to sequence the ends of the inserts from the Lambda DASH II library. 

20 Sequencing reactions are performed by eight individuals using an average of 
fourteen AB 373 DNA Sequencers per day. All sequencing reactions are analyzed 
using the Stretch modification of the AB 373, primarily using a 34 cm well-to-read 
distance. The overall sequencing success rate very approximately is about 85% for 
M13-21 and M13RP1 sequences and 65% for dye-terminator reactions. The 

25 average usable read length is 485 bp for Ml 3-21 sequences, 445bp for M13RP1 
sequences, and 375 bp for dye-terminator reactions. 

Richards et aL, Chapter 28 in AUTOMATED DNA SEQUENCING AND 
ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, 
London, (1994) described the value of using sequence from both ends of 

30 sequencing templates to facilitate ordering of contigs in shotgun assembly projects 
of lambda and cosmid clones. We balance the desirability of both-end sequencing 
(including the reduced cost of lower total number of templates) against shorter 
read-lengths for sequencing reactions performed with the M13RP1 (reverse) primer 
compared to the Ml 3-21 (forward) primer. Approximately one-half of the 

35 templates are sequenced from both ends. Random reverse sequencing reactions are 
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done based on successful forward sequencing reactions. Some M13RP1 
sequences are obtained in a semi-directed fashion: Ml 3-21: sequences pointing 
outward at the ends of contigs are chosen for M13RP1 sequencing in an effort to 
specifically order contigs. 

5 

4. Protocol for Automated Cycle Sequencing 

The sequencing is carried out using ABI Catalyst robots and AB 373 
Automated DNA Sequencers. The Catalyst robot is a publicly available 
sophisticated pipetting and temperature control robot which has been developed 

10 specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted 
templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the 
thermostable Taq DNA polymerase, fluorescently-labelled sequencing primers, and 
reaction buffer. Reaction mixes and templates are combined in the wells of an 
aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear 

15 amplification (i.e.., one primer synthesis) steps are performed including 
denaturation, annealing of primer and template, and extension; i.e., DNA 
synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents 
evaporation without the need for an oil overlay. 

Two sequencing protocols are used: one for dye-labelled primers and a 

20 second for dye-labelled dideoxy chain terminators. The shotgun sequencing 
involves use of four dye-labelled sequencing primers, one for each of the four 
terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, 
permitting the four individual reactions to be combined into one lane of the 373 
DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently 

25 supplies pre-mixed reaction mixes in bulk packages containing all the necessary 
non-template reagents for sequencing. Sequencing can be done with both plasmid 
and PCR- generated templates with both dye-primers and dye- terminators with 
approximately equal fidelity, although plasmid templates generally give longer 
usable sequences. 

30 Thirty-two reactions are loaded per AB373 Sequencer each day, for a total 

of 960 samples. Electrophoresis is run overnight following the manufacturer's 
protocols, and the data is collected for twelve hours. Following electrophoresis 
and fluorescence detection, the ABI 373 performs automatic lane tracking and base- 
calling. The lane-tracking is confirmed visually. Each sequence electropherogram 

35 (or fluorescence lane trace) is inspected visually and assessed for quality. Trailing 
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sequences of low quality are removed and the sequence itself is loaded via software 
to a Sybase database (archived daily to 8mm tape). Leading vector polylinker 
sequence is removed automatically by a software program. Average edited lengths 
of sequences from the standard ABI 373 are around 400 bp and depend mostly on 
5 the quality of the template used for the sequencing reaction. ABI 373 Sequencers 
converted to Stretch Liners provide a longer electrophoresis path prior to 
fluorescence detection and increase the average number of usable bases to 500-600 
bp. 

10 INFORMATICS 

1. Data Management 

A number of information management systems for a large-scale sequencing 
lab have been developed. (For review see, for instance, Kerlavage et al, 
Proceedings of the Twenty-Sixth Annual Hawaii International Conference on 

15 System Sciences, IEEE Computer Society Press, Washington D. C, 585 (1993)) 
The system used to collect and assemble the sequence data was developed using the 
Sybase relational database management system and was designed to automate data 
flow wherever possible and to reduce user error. The database stores and 
correlates all information collected during the entire operation from template 

20 preparation to final analysis of the genome. Because the raw output of the ABI 373 
Sequencers was based on a Macintosh platform and the data management system 
chosen was based on a Unix platform, it was necessary to design and implement a 
variety of multi- user, client-server applications which allow the raw data as well as 
analysis results to flow seamlessly into the database with a minimum of user effort. 

25 

2. Assembly 

An assembly engine (TIGR Assembler) developed for the rapid and 
accurate assembly of thousands of sequence fragments was employed to generate 
contigs. The TIGR assembler simultaneously clusters and assembles fragments of 

30 the genome. In order to obtain the speed necessary to assemble more than 10 4 
fragments, the algorithm builds a hash table of 12 bp oligonucleotide subsequences 
to generate a list of potential sequence fragment overlaps. The number of potential 
overlaps for each fragment determines which fragments are likely to fall into 
repetitive elements. Beginning with a single seed sequence fragment, TIGR 

35 Assembler extends the. current contig by attempting to add the best matching 
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fragment based on oligonucleotide content. The contig and candidate fragment are 
aligned using a modified version of the Smith-Waterman algorithm which provides 
for optimal gapped alignments (Waterman, M. S., Methods in Enzymology 
164:165 (1988)). The contig is extended by the fragment only if strict criteria for 
5 the quality of the match are met. The match criteria include the minimum length of 
overlap, the maximum length of an unmatched end, and the minimum percentage 
match. These criteria are automatically lowered by the algorithm in regions of 
minimal coverage and raised in regions with a possible repetitive element. The 
number of potential overlaps for each fragment determines which fragments are 

10 likely to fall into repetitive elements. Fragments representing the boundaries of 
repetitive elements and potentially chimeric fragments are often rejected based on 
partial mismatches at the ends of alignments and excluded from the current contig. 
TIGR Assembler is designed to take advantage of clone size information coupled 
with sequencing from both ends of each template. It enforces the constraint that 

15 sequence fragments from two ends of the same template point toward one another 
in the contig and are located within a certain range of base pairs (definable for each 
clone based on the known clone size range for a given library). 

The process resulted in 391 contigs as represented by SEQ ID NOs: 1-391. 

20 3. Identifying Genes 

The predicted coding regions of the Streptococcus pneumoniae genome 
were initially defined with the program GeneMark, which finds ORFs using a 
probabilistic classification technique. The predicted coding region sequences were 
used in searches against a database of all nucleotide sequences from GenBank 

25 (October, 1997), using the BLASTN search method to identify overlaps of 50 or 
more nucleotides with at least a 95% identity. Those ORFs with nucleotide 
sequence matches are shown in Table 1. The ORFs without such matches were 
translated to protein sequences and compared to a non-redundant database of 
known proteins generated by combining the Swiss-prot, PIR and GenPept 

30 databases. ORFs that matched a database protein with BLASTP probability less 
than or equal to 0.01 are shown in Table 2. The table also lists assigned functions 
based on the closest match in the databases. ORFs that did not match protein or 
nucleotide sequences in the databases at these levels are shown in Table 3. 
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ILLUSTRATIVE APPLICATIONS 

1. Production of an Antibody to a Streptococcus pneumoniae 
Protein 

Substantially pure protein or polypeptide is isolated from the transfected or 
5 transformed cells using any one of the methods known in the art. The protein can 
also be produced in a recombinant prokaryotic expression system, such as E. coli, 
or can be chemically synthesized. Concentration of protein in the final preparation 
is adjusted, for example, by concentration on an Amicon filter device, to the level 
of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can 
10 then be prepared as follows. 

2. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and 
isolated as described can be prepared from murine hybridomas according to the 

15 classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or 
modifications of the methods thereof. Briefly, a mouse is repetitively inoculated 
with a few micrograms of the selected protein over a period of a few weeks. The 
mouse is then sacrificed, and the antibody producing cells of the spleen isolated. 
The spleen cells are fused by means of polyethylene glycol with mouse myeloma 

20 cells, and the excess unfused cells destroyed by growth of the system on selective 
media comprising aminopterin (HAT media). The successfully fused cells are 
diluted and aliquots of the dilution placed in wells of a microliter plate where 
growth of the culture is continued. Antibody-producing clones are identified by 
detection of antibody in the supernatant fluid of the wells by immunoassay 

25 procedures, such as ELISA, as originally described by Engvall, E., Meth. 
Enzymol 70:419 (1980), and modified methods thereof. Selected positive clones 
can be expanded and their monoclonal antibody product harvested for use. Detailed 
procedures for monoclonal antibody production are described in Davis, L. et ai, 
Basic Methods in Molecular Biology, Elsevier, New York. Section 21-2 (1989). 

30 



WO 98/18931 



50 



PCT/US97/19588 



3. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogenous epitopes of a 
single protein can be prepared by immunizing suitable animals with the expressed 
protein described above, which can be unmodified or modified to enhance 
5 immunogenicity. Effective polyclonal antibody production is affected by many 
factors related both to the antigen and the host species. For example, small 
molecules tend to be less immunogenic than others and may require the use of 
carriers and adjuvant. Also, host animals vary in response to site of inoculations 
and dose, with both inadequate or excessive doses of antigen resulting in low titer 
10 antisera. Small doses (ng level) of antigen administered at multiple intradermal 
sites appears to be most reliable. An effective immunization protocol for rabbits 
can be found in Vaitukaitis, J. et aL, 7. Clin. Endocrinol. Metab. 33:988-991 
(1971). 

Booster injections can be given at regular intervals, and antiserum harvested 

15 when antibody titer thereof, as determined semi-quantitatively, for example, by 
double immunodiffusion in agar against known concentrations of the antigen, 
begins to fall. See, for example, Ouchterlony, O. etai, Chap. 19 in: Handbook of 
Experimental Immunology, Wier, D., ed, Blackwell (1973). Plateau concentration 
of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12M). 

20 Affinity of the antisera for the antigen is determined by preparing competitive 
binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of 
Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For 
Microbiology, Washington, D. C. (1980) 

Antibody preparations prepared according to either protocol are useful in 

25 quantitative immunoassays which determine concentrations of antigen-bearing 
substances in biological samples; they are also used semi- quantitatively or 
qualitatively to identify the presence of antigen in a biological sample. In addition, 
antibodies are useful in various animal models of pneumococcal disease as a means 
of evaluating the protein used to make the antibody as a potential vaccine target or 

30 as a means of evaluating the antibody as a potential immunotherapeutic or 
immunoprophylactic reagent. 
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4. Preparation of PCR Primers and Amplification of DNA 

Various fragments of the Streptococcus pneumoniae genome, such as those 
of Tables 1-3 and SEQ ID NOS: 1-391 can be used, in accordance with the present 
invention, to prepare PCR primers for a variety of uses. The PCR primers are 
5 preferably at least 15 bases, and more preferably at least 18 bases in length. When 
selecting a primer sequence, it is preferred that the primer pairs have approximately 
the same G/C ratio, so that melting temperatures are approximately the same. The 
PCR primers and amplified DNA of this Example find use in the Examples that 
follow. 

10 

5. Gene expression from DNA Sequences Corresponding to 

ORFs 

A fragment of the Streptococcus pneumoniae genome provided in Tables 1- 
3 is introduced into an expression vector using conventional technology. 

15 Techniques to transfer cloned sequences into expression vectors that direct protein 
translation in mammalian, yeast, insect or bacterial expression systems are well 
known in the art. Commercially available vectors and expression systems are 
available from a variety of suppliers including Stratagene (La Jolla, California), 
Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If 

20 desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular 
expression organism, as explained by Hatfield et al., U. S. Patent No. 5,082,767, 
incorporated herein by this reference. 
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The following is provided as one exemplary method to generate 
polypeptide(s) from cloned ORFs of the Streptococcus pneumoniae genome 
fragment. Bacterial ORFs generally lack a poly A addition signal. The addition 
signal sequence can be added to the construct by, for example, splicing out the poly 
5 A addition sequence from pSG5 (Stratagene) using Bgll and Sail restriction 
endonuclease enzymes and incorporating it into the mammalian expression vector 
pXTl (Stratagene) for use in eukaryotic expression systems. pXTl contains the 
LTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The 
positions of the LTRs in the construct allow efficient stable transfection. The 

10 vector includes the Herpes Simplex thymidine kinase promoter and the selectable 
neomycin gene. The Streptococcus pneumoniae DNA is obtained by PCR from the 
bacterial vector using oligonucleotide primers complementary to the Streptococcus 
pneumoniae DNA and containing restriction endonuclease sequences for PstI 
incorporated into the 5' primer and Bglll at the 5' end of the corresponding 

15 Streptococcus pneumoniae DNA 3' primer, taking care to ensure that the 
Streptococcus pneumoniae DNA is positioned such that its followed with the poly 
A addition sequence. The purified fragment obtained from the resulting PCR 
reaction is digested with PstI, blunt ended with an exonuclease, digested with 
Bglll, purified and ligated to pXTl, now containing a poly A addition sequence 

20 and digested Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using 
Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions 
outlined in the product specification. Positive transfectants are selected after 
growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). 

25 The protein is preferably released into the supernatant. However if the protein has 
membrane binding domains, the protein may additionally be retained within the cell 
or expression may be restricted to the cell surface. Since it may be necessary to 
purify and locate the transfected product, synthetic 15-mer peptides synthesized 
from the predicted Streptococcus pneumoniae DNA sequence are injected into mice 

30 to generate antibody to the polypeptide encoded by the Streptococcus pneumoniae 
DNA. 
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Alternatively and if antibody production is not possible, the Streptococcus 
pneumoniae DNA sequence is additionally incorporated into eukaryotic expression 
vectors and expressed as, for example, a globin fusion. Antibody to the globin 
moiety then is used to purify the chimeric protein. Corresponding protease 
5 cleavage sites are engineered between the globin moiety and the polypeptide 
encoded by the Streptococcus pneumoniae DNA so that the latter may be freed 
from the formed by simple protease digestion. One useful expression vector for 
generating globin chimerics is pSG5 (Stratagene). This vector encodes a rabbit 
globin. Intron II of the rabbit globin gene facilitates splicing of the expressed 

10 transcript, and the polyadenylation signal incorporated into the construct increases 
the level of expression. These techniques are well known to those skilled in the art 
of molecular biology. Standard methods are published in methods texts such as 
Davis et al, cited elsewhere herein, and many of the methods are available from the 
technical assistance representatives from Stratagene, Life Technologies, Inc., or 

15 Promega. Polypeptides of the invention also may be produced using in vitro 
translation systems such as in vitro ExpressTM Translation Kit (Stratagene). 

While the present invention has been described in some detail for purposes 
of clarity and understanding, one skilled in the art will appreciate that various 
changes in form and detail can be made without departing from the true scope of 

20 the invention. 

All patents, patent applications and publications referred to above are 
hereby incorporated by reference. 
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(1) GENERAL INFORMATION: 

(i) APPLICANT: Charles Kunsch 

Gil H. Choi 
Patrick S. Dillon 
Craig A. Rosen 
Steven C. Barash 
Michael R. Fannon 
Brian A. Dougherty 

(ii) TITLE OF INVENTION: Streptococcus pneumoniae Polynucleotides and Sequences 

(iii) NUMBER OF SEQUENCES: 391 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Human Genome Sciences, Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 
<D) STATE: Maryland 

( E ) COUNTRY : USA 

(F) ZIP: 20850 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 4 8 6/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 



<vi) CURRENT APPLICATION DATA: 
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(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Brookes, A. Anders 

(B) REGISTRATION NUMBER: 36,373 

(C) REFERENCE /DOCKET NUMBER: PB340P1 

(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (301) 309-8504 

(B) TELEFAX: (301) 309-8512 
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(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 62 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

CCAAGCAAAA CCAGCTACAG CTAAAGGAAC TTACGTAACA AACTTGACTA TCACAACTAC 60 

TCAAGGTGTT GGT AT CAAAG TTGACGTAAA CTCACTTTAA TCAGTAGTTA AAGTAATGTA 12 0 

AAAAAGT TGA AGACGCTATG TCTCAACTTT TTTTGATGTA CGACGGGCAT GTTGTATAGT 180 

AGATGTGTAC TATTCTAGTT TCAATCTACT ATAGTAGCTC AG AAGT CGGT ACTTAAACGT 2 40 

G C T AT AT C AA AACCAGTCCT TGAAAAACGT GGACTGGTTT CGTGTTTGGA TTATTACCTT 300 

GAACGACATG CGTTAAAAGT TAGTTGAACC GCCGTATGCC GAACGGACGT ACGGTGGTGT 3 60 

GAGAGGGGCT AG AG AT TAT C CCCTACTCGA TTTCGAAATC TAGTGGAATG AATCTGGAAT 420 

AGTCCATCGA GCTTTCTAAT ACTCTTCGAA AATCTCTTCA AACCACGTCA ACGTCGCCTT 48 0 

GCCGTGCGTA TGGTTACTGA CTTCGTCAGT TCTATCCACA ACCTCAAAAC AGTGTTTTGA 54 0 

GCTGACTACG TCAGTTCCAT CTACAACCTC AAAACAGTGT TTTGAGCAAC CTGCGGCTAG 600 

TTTCCTAGTT TGCTCTTTGG TTTTCATTGA GTATAACACA TTGTTAGAAG TTGGTTTAAA 6 60 

TTTCCTAATC AGTTTGTTCA CATTTACCTT CGATATATTA TATCCCATAG TTAAGGTTGG 72 0 

T CAT AC AG AT GATTATAGTC ATGGAGCCGT AAAACTTAGT GTTTCTTTAG TTGACAAAGA 7 80 

TGCCATGAAA AAAATATTTG TAACTGTAAT AGGATATTTT GAAATAAATA TAGATGAAAA 84 0 

TATCACCGAT ATTCTATACG TAAATGGTAC TGCTATTCTT TATCTTTATT TACGTTCAAT 9 00 

TGTTTCAATA GTTTCGGCAA TTGATAGCAG TGAAGCAATG TTGCTACCTA TCATTAATGT 960 

TTTAGAGTTA CTAGATAAAT CTCAACCTTT TGAAGAAGAA TAATTTATTA GCTCACTAAA 102 0 

TTGAGGGTAA GGAAAAGTAA AAGCAGTAAG AAAAATGTCT TGCATTATAC AGCAACCTTT 108 0 

TGGGAATGAG TGGATGGATT GAATAAAATT TGATTAAGAG TGGATGATTT ATCTGTAGAT 114 0 

TATTATTGGA CAGTTAGTCT TGAAGTAGTC TAAGAATTAG GTTATAATCA GTAGAAGCCT 12 00 

TGCTAATAAT GAGGAGGTTA GTTTATGTAT AGTAGACTGA AT C T AAAAT A GTACGAAACA 12 60 

ATTGCTAAAA CATTTATAGA AATTAATTTT ACTTTCCCAA TCGATTTGTT CTCATCTTAT 13 2 0 

TTCAATCCGC TATATATTAT GGTATCGAAT CTTCATCAGA ATGAT AAAAT TAATCAATTG 13 80 

ATATCTGATT ACAAACAGAA TATGAAAGCT TTTTATATCA CTATTGAAAA AT T T AT ACG A 1440 
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GATGATGAAA GCCTTAAGTG TTATTTTATA AAGGTTATTT CAAGTCGTTC CAAGGTAACA 150 0 

AGTCTAGATC AGATTGAAGC TGATAAAACG ATACAAAGAA AATATTCAAG TGAGCTAAAA 1560 

AAATTTATTG G ATT TT AT AA TGAGATTATT TGTGAGGAAA ATAGTTTCCT ACATGTACGA 162 0 

AAGAGGTGGT CGAGTTGGTT TAGGTAGTCG ATGCGTGAGT TGATAATTCT CAGGGTATGG 16 8 0 

ACTTCTTTTT CATGAATGAG GTAAAAGAGC AGGTATTGTT TAGAGACAAT CATTCTGAGC 1740 

ATATTTTCTG GATAGAGGGA GTATCCGATT TT AT GAT C AA AGTTAATACC GCCCTCTGGT 1800 

GAGAAGATGA GTAGGTTGGT AATTTAAACT ATTAAACAGA ATTTTTGATT AAAAGT AT T A 1860 

TTTCATGAGA GAAATCCTAA TTTCACAATC CATAGGCAAA CGCTTGCATT TCGTTTTTTA 192 0 

TTGGACTATA ATAGGTTGGT ATAAAGCCTT CTGTAGTAAT AAAAT GT AG A AGGTGTAGAA 1980 

AGTAAGGATT TAGAATATTT GTAGTTAAAA ACACAATGTT GCTATTCCTT AC G AT AGGG A 2040 

GATAGATATG GCAATGATAG AAGTGGAACA TCTTCAGAAA AATTTTGTGA AGACTGTTAA 2100 

GGAACCGGGC TTGAAGGGGG CTTTGCGCTC CTTTATTCAT CCTGAAAAGC AGACCTTTGA 216 0 

AGCGGTCAAG GATTTGACCT TTGAGGTTCC AAAAGGGCAG ATTTTAGGAT TTATCGGGGC 2220 

AAATGGTGCT GGGAAGTCGA CAACCATTAA AATGCTGACA GGAATTTTGA AACCAACATC 2280 

TGGTTTTTGT CGGATTAACG GCAAGATTCC CCAGGACAAT CGGCAAGATT ATGTCAAAGA 23 4 0 

TATTGGCGTA GTCTTTGGAC AACGCACCCA GCTATGGTGG GATTTGGCTC TGCAAGAGAC 2 40 0 

CT AC ACT GT C TTAAAAGAGA TTTATGATGT GCCAGACTCG CTCTTTCATA AGCGTATGGA 2460 

CTTTTTGAAT GAAGTCTTGG ATTTGAAGGA CTTTATCAAG GATCCCGTGC GGACTCTTTC 2520 

ACTGGGACAA CGGATGCGGG CGGATATTGC GGCCTCCTTG CTCCACAATC CCAAGGTTCT 2580 

TTTTTTAGAT GAGCCGACCA TTGGTTTGGA CGTTTCGGTT AAGGATAATA TTCGTCGGGC 2640 

AATT AC TC AG ATCAATCAAG AGGAAGAAAC T AC C ATT C T T TTGACCACTC ACGATTTGAG 2 7 00 

TGATATTGAG CAACTTTGTG ATCGGATTTT CATGATTGAC AAGGGGCAAG AGATTTTTGA 2 7 60 

TGGAACGGTG AGCCAACTCA AGG AG AC C T T TGGTAAGATG AAGACTCTCT CTTTTGAACT 2 82 0 

GCTACCAGGT CAAAGTCATC TCGTCTCTCA CTATGACGGT CTGTCTGATA TGACCATTGA 2880 

TAGACAAGGA AACAGCCTCA ACATTGAATT TGATAGTTCT CGCTACCAGT CAGCTGACAT 2 94 0 

TATCAAGCAA ACCCTGTCTG ATTTTGAAAT CCGCGATTTG AAGATGGTGG AT ACG G AT AT 3 000 

TGAGGATATT ATCCGTCGCT T CT ACCG AAA GGAGCTCTAG G ATG AT C AAA TTGTGGAGAC 3 060 

GTTATAAACC CTTTATCAAT GCAGGGGTTC AGGAGTTGAT TACTTACCGA GTCAACTTTA 312 0 

TTCTCTATCG GATTGGCGAT GTCATGGGGG CTTTTGTGGC CTTTTATCTC TGGAAGGCTG 3180 
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GATGCCAGTC 


AGATTCTTCA 


GGCACTCCTT 


TTGCAGTTCT 


3780 


TCTGGCTCTT 


AGTGATGGTG 


GGATTGTCTC 


AGTTAATTTG 


GAAACGGGTC 


CAGTCCTTTA 


3840 


TC AC C ATT C A 


AGGAGGTTAG 


TATGAAAAAA 


TATCAACGAA 


TGCATCTGAT 


TTTTATCAGA 


3900 


CAATACATCA 


AACAAATCAT 


GGAATATAAG 


GTAGATTTTG 


TGGTTGGTGT 


CTTGGGAGTC 


3960 


TTTCTGACTC 


AAGGCTTGAA 


TCTCTTGTTT 


CTCAATGTCA 


TCTTTCAACA 


TATTCCATTC 


4020 


CTAGAAGGCT 


GGACCTTTCA 


AG AGAT AG CT 


TTCATTTATG 


GATTTTCCTT 


GATTCCCAAG 


4080 


GGAATGGACC 


ATCTCTTTTT 


TGACAATCTC 


TGGGCACTAG 


GGCAACGCCT 


AGTCCGAAAA 


4140 


GGGGAGTTTG 


AC AAGT AT CT 


GACTCGTCCC 


ATCAATCCTC 


TCTTTCACAT 


C CT AGT T G AA 


4200 


ACCTTTCAGA 


TTGATGCCTT 


GGGTGAACTC 


TTAGTCGGTG 


GTATTTTATT 


GGGAACAACA 


4260 


GTGACCAGCA 


TTGTTTGGAC 


TCTTCCAAAA 


TTCCTGCTTT 


TCCTAGTTTG 


TATTCCTTTT 


4320 


GCGACCTTGA 


TTTATACTTC 


TCTTAAAATC 


GCAACAGCCA 


GTATCGCCTT 


TTGGACTAAG 


4380 


CAGTCAGGCG 


CCATGATTTA 


CATCTTCTAT 


ATGTTCAATG 


ACTTTGCTAA 


GTATCCGATT 


4440 


TCT AT T T AC A 


ATTCTCTTCT 


TCGTTGGTTG 


ATTAGCTTTA 


TCGTGCCTTT 


CGCCTTTACA 


4500 


GCCTACTATC 


CAGCTAGCTA 


TTTCTTACAG 


G AAAAGG AT G 


TGTTCTTTAA 


CGTAGGAGGT 


4560 


TTGATGTTGA 


TTTCTCTGGT 


TTTCTTTGTT 


ATTTCCCTTA 


AACTTTGGGA 


TAAGGGCTTA 


4620 


GATTCCTACG 


AAAGTGCGGG 


TTCGTAAAAG 


CTAAAGTAAG 


ACT AAAAT C A 


AGAAAGAAAC 


4680 


TTATGATGTT 


TGTAATTGAA 


GAAGTCAAGG 


ATGAAAATCA 


AAAAAAGGCA 


GTTGTCGCTG 


4740 


AGGTTTTGAA 


GGATTTGCCA 


GAATGGTTTG 


GAATCCCAGA 


AAGCACACAA 


G C CT AT AT AG 


4800 


AAGGAACCAC 


GACACTGCAA 


GTTTGGACCG 


CCTATCAGGA 


GAGTGATTTG 


ACTAGATTTG 


4860 


TAAGCTTATC 


CTATTCGAGT 


GAAGATTGTG 


CAGAGATTGA 


TTGTCTCGGC 


GTAAAAAAGC 


4920 


TTATCAAGGT 


AGAAAAATTG 


GGAGCCAATT 


GCTTGCTACT 


TTAGAGAGTG 


AAGCTCGTAA 


4980 
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AAAAGTTGGT TATCTGCAGG TCAAAACAGT GGCAGAAGGT TCTAATAAAG ATTATGATCG 5040 

AACAAATGAC TT TT AT CG AG GTCTTGGCTT TAAAAAGTTA GAGATTTTTC CTCAACTATG 5100 

GAATCCGCAA AATCCTTGTC AGATTTTGAT TAAAAAGCTT GAATAATATT ACTTGACATC 5160 

TATTCTCAGA GTGCTATACT GTAAGTGTAA TCGCCGATTT AGCTTAGTTG GTAGAGCAAG 5220 

GCACTCGTAA AGCCTAGGTT ATAGGTAGAT AAACGACTGA GGATTTGAAA AAATAGATAG 52 8 0 

GTAGAAGATA ACCGTTAAGC CTTACTCTTA GCGGTTATTT ATATTGTTTA ATAGCGCTAA 5340 

TATTTTATCA ATTATGCCTG TTTTCGTGTT TCTGGTAGTT GTTCAAGTTT ATTGCTACTA 54 00 

TTTTTGATGG TATGAATGTG CTTATAATGT ATCCCGGTTA ACGAAAGTTT TGGACTTATA 54 60 

CTCTTCGAAA ATCTCTTCAA ACCACGTCAA CGTCGCCTTG CCGTGCGTAT GGTTATGACT 552 0 

TCGTCAGTTC TATCCACAAC CTCAAAACAG TGTTTTGAGT GACTACGTCA GTTCCATCTA 5 5 80 

CAACCTCAAA ACACTGTTTT GCCCAATCTG CGGCTAGTTT CCTAG 5 62 5 



(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7571 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CTCTCCAGCT TTCCTTGCGA GTTGGCCATG TTGTGTCTTT AAGAAGTCTA AAAATATCTC 6 0 

CAATAAAACG CATCGCTCTC TCCTATCTCG TTTCTCTGTG TGTAGTGTAC TTGCCACAAT 12 0 

GCTTACAAAA TTTATTTACT TCTAGTCGTG TAGGCTTGAG GTTTCCGCTG ATCTTGATTG 180 

AATAGTTTCT CGAACCACAA ACCGCACAAG CTAGGCTTGC TTTTTTTAGT GCCATAACGC 240 

CTCCATCTTA T C CAT T AT AA CAAGAAAGCT AGGCTTTGAC AAGCATCTTA GCGAAATAGA 300 

T TG AC T AT CG AATC C CAT AT TGTTTGAGCC TTTTCCTTAA TCTTCGCATC TGAGATAGCC 3 60 

CGGCTAGCCT CATCTACTAG ACTTTGCGCA CGCCCTCGAA TATCAGACAA ATTATCATCT 42 0 

GTCTGGCTAT TATCATTGGT TTGTACTTGT CTTTTTGTAT TGGCTGGTGC AATTCCATTT 4 80 

TGCTTATAAG CATTTTCAAC CGTAAAGGTA CTTCCTGGCG TATAAGGTAA AATGGTATTG 54 0 

GCAATGTTTC TAAAGACATG AGCTGCACCG TTTGAAGTAG AGCCAGCTAG ATAGTGGTTT 600 

TCATCAGTGG TCGGAAAGCC AAGCCAGTGG C T AAT C ACT A CATCCGGAGT ATAACCAATT 660 

ACCCACTGGT CACTTGTGTA CTCCGGATTG AAAACTGCTT CAGTTGTTCC AGTTTTCCCT 720 
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G C CAT G AC AT AGTCTGCAGG CGATGAACTA AT AC C GGT AC CGTTGGTGAA AGTCCCCAAC 7 80 

ATC AT ACT GG TCATCTTGTC AG CT AC AG AC TTATCAATCA CCCGTTTTTG TGAATTTTTA 84 0 

TGACTCGCAA TAACTTGTCC ACTAGCATTT TCAATTCTAC TAATAAAATG AGCTTCAGGC 900 
ATTAAACCTT CATTTGCAAA GGCGGCGTAT GCTTGAGCCA TTTGAAGAGG GTTGGTTTCA 9 60 

ACACCGCTTC CCAAGGCGAC ACCAAGAACA CGGTCGACCT TTTCCATGTT GAGTCCGAAT 102 0 

TTTTCGCCTG CCTCAAAAGC CTTGTCGACA CCCAAATCAT TAACAGTGGC AACAGCAGGT 1080 

AGATTAAGCG ATTCTGCCAA GGCTTGATAC ATAGGAACTT CTCGACTCGT TTTGATCCCT 114 0 

GCATAGTTAT CAACCTTATA GCTGTCATAC TGC AT GGT AT GGTTATCCAA CTGCTTATTC 12 00 

AAAGCCCAGC TTGCTTCAAC TGCTGGCGTA TAAACAACTA AAGGCTTAAT TGTAGAACCA 12 60 

GGACTACGCT TTGATTGGGT TGCATAGTTG AAATTCCGGA ATCCAGTTTT ATCATTGTCA 13 2 0 

GCAACTTGAC CGACAACTCC ACGAACTCCC CCTGTTTTCG GTTCGAGGGC TACACTTCCT 13 80 

GATTGAGCAA ACGTTCCATC CTCTGCCCTC GGAAATAGCG ATGTGTTTTC ATAAACAATC 144 0 

TGCATATTTG CTTGGTAGTT TTGGTCCAGC TCTGTGTAAA TGCGGTAGCC ATTATTGACA 1500 

ATCTCTTCCT CTGTTAGATT ATACTTGGAA ACAGCTTCAT TAACCACCGC ATCAAAATAA 15 60 

GAGGGGTAAC GGTAATCTGA GATTTTTCCT TCATACTTAT CGTGCAATTG CGAAGTCATA 162 0 

TCAACTTCAG CAGCTTTGGT TTCTTGGTTT TTATCAATAT ATCCTGCTGC AACC AT AT T C 1680 

TGCAAGACAG TATCGCGCCG ATTAGTAGAA TCTTCTACGG AATTCAAGGG AT T AT AC AGT 17 40 

TCCGGCCCCT TGAGCATCCC TGCCAGAGTC GCAGCTTGAT CCAGACTCAC TTCTGATGCA 1800 

GAAACTCCAA AGTATTTCTT ACTCGCATCT TCTACACCCC ACACACCATT TCCAAAATAA 18 60 

GCGTTGTTAA GGTACATGGT TAGAATTTGC TCCTTACTAT ATTTTTTGCT TAATTCTAAG 192 0 

GCAAGGAAAA ATTCTTTCGC TTTTCTCTCA ACAGTTTGAT CCTGCGATAA ATAGGCGTTT 19 80 

TTAGCCAGCT GTTGGGTAAT GGTAGAGCCA CCACCTGAAC GTCCAGCAGT GACAATAGCC 204 0 

AAGAAAAAAC GGCCATAGTT AATCCCGTCA TTTTTATAGA AAGAACGGTC TTCTGTCGCA 2100 

ATAACAGCAT TCTGCAAGTT TTTACTGATG T C AGT C AG CT C AAC AT AG GT TCCCTTTTGA 2160 

CCAGACAAGG CACCAGCCTC TTTTTCTTCA CGGTCAAAAA TAAGAGTCCG AGTTTTCAAG 222 0 

GCATTTTGCA AATCATTGAC ATTGGT CG AC TTGGCTACAG CAAACAAATA GATTCCAACT 22 80 

AGCAAGCCTG CACTCAAACC TAGTATAAGG ATAATCTTTG T T AG ATG AT A ACGACGCCAG 2340 

AATTTTCGAA TCGGACCTAC TTGGGCTAAT TTTTTTCGAT CACTACGAGA GCGACGTAAG 24 00 

ATAGTAGAAT CAGAGTCCTC TAGTTCACTT GTTTCTTTTT TAAAAAGAGA AAGAAATTTC 2460 

TCAAATAATT TATCTAATTT CATGCGTTTA TTTTATCATC TTCATCATAG GAAGACAAGA 2520 
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ATTTAGCTAT 


TTCCTATCCA 


AATAGGGCTT 


TTTTTGTTAC 


AATATCTGTA 


TGCAATTCAC 


2580 


ATTTACATTA 


CCCGCCTCTC 


TACCTCAAAT 


GACAGTAAAG 


CAATTACTTG 


AGGAACAACT 


2640 


CCTCATCCCT 


AGAAAAATCC 


GTCATTTTTT 


GAGAATCAAG 


AAACATATTT 


TGATAAATCA 


2700 


AGAAGAAGTC 


CACTGGAAGG 


AAATCGTAAA 


T C CTGG AG AT 


GTTTGCCAGT 


TGACTTTTGA 


2760 


CGAGGAAGAT 


TATTCCCAAA 


AGACGATCCC 


TTGGGGCAAC 


CC AG ACT TAG 


TGCAGGAAGT 


2820 


TTATCAAGAT 


CAACACTTGA 


TTATTGTAAA 


CAAACCAGAG 


GGGATGAAAA 


CGCATGGTAA 


2880 


TCAACCAAAC 


GAAATTGCCC 


TTCTTAACCA 


TGTCAGTACC 


TATGTTGGCC 


AAACCTGCTA 


2940 


TGTCGTTCAT 


CGTCTGGACA 


TGGAAACCAG 


TGGCTTAGTT 


CTCTTTGCCA 


AAAATCCTTT 


3000 


TATCCTGCCC 


ATTCTCAATC 


GCTTATTGGA 


GAAAAAAGAG 


ATTTCTAGAG 


AATATTGGGC 


3060 


TCTAGTTGAT 


GGAAATATCA 


ACAGAAAAGA 


ACTTGTTTTC 


AGAGACAAAA 


TTGGACGTGA 


3120 


TCGCCATGAT 


CGTAGAAAAA 


GAATAGTTGA 


TGCAAAAAAT 


GGGCAATATG 


CTGAAACGCA 


3180 


TGTAAGCAGA 


TTAAAGCAAT 


TCTCAAACAA 


GACTTCCTTG 


GCTCATTGCA 


AG C T AAAG AC 


3240 


AGGGCGAACC 


CATC AG AT TC 


GTGTGCACCT 


TTCGCATCAT 


AATCTTCCTA 


TCCTGGGAGA 


3300 


CCCTCTCTAT 


AATAGTAAAT 


CAAAGACAAG 


CCGGCTTATG 


CTTCATGCCT 


TCCGACTTTC 


3360 


CTTTACCCAC 


CCACTTACTT 


TAGAGAAGCT 


AACTTTCACT 


ACCCTTTCAA 


ATACATTTGA 


3420 


AAAAGAATTA 


AAAAAGAATG 


GATGATCGTG 


TCATCCATTT 


TTCCATATAA 


AAAAGCAAGA 


3480 


CCACAAAGCC 


TTGCTTTCTA 


TCAACTCAAG 


AATTATTTAG 


CAATTTTTGC 


GAAGTATTCA 


3540 


AGAGTACGAA 


CAAGTTGTGC 


AGTGTATGAC 


ATTTCGTTGT 


CGTACCATGA 


TACAACTTTA 


3600 


ACCAATTGTT 


TACCGTCAAC 


GTCAAGAACT 


TTAGTTTGAG 


TTGCGTCAAA 


CAATGAACCG 


3660 


TAAGACATAC 


CTACGATATC 


TGAAGATACG 


ATTGGATCTT 


CTGTGTAACC 


GTATGATTCG 


3720 


TTTGAAGCTG 


CTTTCATAGC 


TGCGTTCACT 


TCATCAACAG 


TAACGTTCTT 


TTCAAGAACT 


3780 


GCTACCAATT 


CAGTAACTGA 


TCCAGTTGGA 


GTTGGAACGC 


GTTGTGCAGA 


TCCGTCAAGT 


3840 


TTACCATTCA 


ATTCTGGGAT 


TACAAGACCG 


ATAGCTTTTG 


CAGCACCAGT 


TGAGTTAGGA 


3900 


ACGATGTTTG 


CAGCACCAGC 


GCGAGCACGG 


CGAAGGTCAC 


C AC C ACGGTG 


TGGTCCGTCA 


3960 


AGGATCATTT 


GGTCACCAGT 


GTAAGCGTGG 


ATAGTAGTCA 


TCAATCCTTC 


AACAACACCA 


4020 


AAGTTGTCTT 


GAAGAGCTTT 


AGCCATTGGA 


GCCAAGCAGT 


TTGTAGTACA 


TGAAGCACCT 


4080 


GAGATAACTG 


TTTCAGTACC 


GTCAAGAACG 


TCGTGGTTAG 


TGTTGAATAC 


AACTGTTTTA 


4140 


ACGTCGTTTC 


C AC C AGGAGC 


AGTGATAACA 


ACTTTTTTAG 


CTCCACCTTT 


AAGGTGTTTT 


4200 


TCAGCTGCTT 


CTTTCTTAGC 


AAAGAAACCA 


GTAGCTTCAA 


GAACGATTTC 


TACACCGTCA 


4260 
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GTAGCCCAGT 


CGATTTGTTC 


TGGATCACGT 


TCAGCAGAAA 


CTTTGATGAA 


TTTACCGTTA 


4320 


ACT T C AAATC 


CACCTTCTTT 


AACTTCAACA 


GTACCGTCGA 


AACGACCTTG 


AGTTGTGTCG 


4380 


TATTTCAACA 


AGTGTGCAAG 


CATAACTGGA 


TCTGTAAGGT 


CGTTGATGCG 


TGTAACTTCA 


4440 


ACACCTTCTA 


CGTTTTGGAT 


ACGACGGAAA 


GCAAGACGAC 


CGATACGTCC 


GAAACCGTTA 


4500 


ATACCAACTT 


T AACT AC CAT 


TAGTGATTTC 


CTCCTTATGA 


AAATCATGAA 


ATTTTTATTG 


4560 


TGAAAAGAGT 


AACTTGAATC 


ACTACAAATC 


ACCTTTCAAC 


AAACCTATTA 


TACAACTATT 


4620 


TGAGTTGAAT 


TGCAAGTATG 


GCCATTGTTT 


TTCTATGTTA 


GTTTCTTTTT 


AAGACTGTAA 


4680 


ACCAAGGAAT 


CCCTTACTAT 


TCATAGCATA 


ACGATTCTAT 


AGGATCCATT 


TTACTAATCT 


4740 


TACGCGCCGG 


GAAGTAGGCT 


GAGACATAAC 


CAAGTAATAG 


AGCGAAAACT 


AGAGTTCCTA 


4800 


AAACAGATAA 


AAGATTTAAT 


TTAAAAACCT 


TAGTGATGGA 


TGGGTAAAAG 


TGACTTACAA 


4860 


TCGCATTCGC 


CAAACTTCCC 


ACCCCTTGTG 


CAACCAAAAA 


TGCCAGCAGC 


AAGGCGATGC 


4920 


CTACAATCCA 


GATAGCCTCG 


TAAATAAAAA 


TTCCTTTGAC 


ATCACGATTC 


TGATAACCAA 


4980 


CTGCTTTCAT 


G AC AC CT ATT 


TCCTTGGAAC 


GTTGCATGAT 


ATTGATGTAA 


ATAATGATAC 


5040 


CAATCATAAC 


CGCTGCTACC 


ACAATAGCTT 


GTGATGAAAG 


CACAATCAAT 


AATCCCTGAA 


5100 


TAACACGAAT 


AAAGGTAATC 


ACAATATCAA 


GAACTCTCTG 


TTGAGAAAGC 


ACAGTATACT 


5160 


TCTTATTTTT 


CTGTAATTCT 


TCTGTTACTA 


CTTTTGTCTG 


TGATGGATCT 


TTGAGTTCCA 


5220 


AGATAAAATA 


AG AT AC AG CT 


TTCGT AAATC 


CAGCCTCTTT 


CAAAATCGTT 


TCCATTTGAT 


5280 


GAGACAGCAT 


GAAACTGTTG 


CTGTCCTCCA 


TGTCATCTTC 


ATCATTGATT 


ACACGTACAA 


5340 


TCTTCGTTTG 


AAATTGAGCA 


AT CT TACT AG 


TTTCGGCAGC 


ACTTTCTACA 


ATGCTGGCTG 


5400 


AGACTGATTT 


GCCAATAAGA 


TCATTAGCTG 


TCAAATTTTT 


TCCTGTCTGT 


TCATTCCAAT 


5460 


TTTTTAGTAA 


ACTGCTTGGA 


ATCGTTAATC 


CCTGTTCATT 


TGTATCAGTA 


T AG AGGG AT C 


5520 


CAGCCAACAC 


TTTGTCCGTC 


TC ATT ATT AC 


TAACAGAGAT 


ACTTGTATCA 


TCATAAAGAC 


5580 


T C ACT AC TTG 


AGCATAAGAA 


GGCATCGTTT 


GACTCAGATC 


CATTTCTTGC 


C CATC TAT AG 


5640 


TAATATTTGA 


CATGTTCATC 


C C AAAAGGAC 


TCTCCAAATA 


TTTAATAGCT 


TCTTTCCCAA 


5700 


CTGTATCCGT 


GATATATAGT 


CAATTGAAAC 


AAGAGCAGGA 


TAAAAAAGCC 


TCGTAAAAGG 


5760 


TATTGCAACT 


TGGTAATACC 


TTTTTGAGGT 


GCTTTTTGAT 


ATG AG C C CAT 


GTTTTCTCAA 


5820 


TAGGATTGTA 


CTCAGGCGAG 


TAGGGAGGAA 


GAGGTAAAAG 


TTTATGCCCA 


AACTCTTCGC 


5880 


AT AAAAGTT C 


TAGCTTCCCC 


ATTCT AT GG A 


ATCTTACATT 


ATCCATAATA 


ATAACCGATG 


5940 


GTGTGTTTAA 


TGTTGGTAAG 


AGAAAATTCT 


GAAACCAAGC 


TTCAAAAAAG 


TCGCTCGTCA 


6000 


TCGTCTCTTC 


GTAAGTCATT 


GGAGCGATTA 


ATTCACCATT 


TGTTAGACCT 


GCAACCAAAG 


6060 
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AAATCCTCTG ATATCTTCTT CCAGATACTT TGCCTCTTAT TAATTGACCT TTTAATGAGC 612 0 

G AC CAT ATT C TCGATAAAAA TAAGTATCGA ATCCTGTTTC GTCAATCTAA ACAGGTGCTA 618 0 

GGTGCTTTAA ACTATTAAAA TTCTTAAGAA ATAAGGCTAC TTTTTCTGGG T C TTGT T CAT 624 0 

AGTAGGTGTG GTTCTTTTTT CGAGTGTAGC CCATAGCTTT GAGCGTATAG TGGATGGTAG 6 3 00 

TTGGATGACA GCCAAATTCA GAAGCTATTT CAGTCAAATA AGCGTCTGGA TTGTCAGTAA 63 6 0 

GAT AG t TT T T AAGTCTATCT CTATCAACCT TTCTTGGTTT TATTCCTTTT ACTTGGTGGT 642 0 

TTAGCTCTCC TGTTTTCTCT TTTAGCTTTA AC C AG CC AT A AATGGTATTA CGTGAGATTT 64 80 

GGAAAACGTG TGATGCTTCT GTTATACTAC CTGTTCGCTC ACAATAAGAG AGAACTTTTT 6 54 0 

TACGAAAATC TATTGAATAT GCCATAAAAA GATTATACCA CATTGTGTAC TATTTTTGGT 6600 

TCATTTTACT ATATTTGAAG AGGCGTTTAA ACTATCTGAC ATAAAACTCG TTCTAGAGGA 6 660 

AAGACATCCT TTAAAAAGTT AGTTTATTTT ACAACTTAGA CAT C AAGGT A GGTTAACCCC 67 2 0 

TTCATGGAAA AATCAAGACT CTTAGCACTA TGGGTTAAAC TACCACTGGA GACGTAATCA 67 8 0 

ATCGCTAAAC C AC G AAAACG GCTAATAGTG GTCATATCAA TATTTCCAGA ACATTCAATC 6840 

CGAGAACGTC CTGCAATTAG GGTAATGGCC TGTTCAATCT GTTCCAATGA CAT ATT AT C C 69 00 

AACATGATAA TATCAGCACC CGCCGCCGCA GCTTCTTCGG CAGCAGCAAG GCTTTCCACT 6 9 60 

TCCACCTCGA CCATTTTCAC AAAAGGGGCA TAGGCACGCG CTTGAGCAAT TGCCTTTTGA 7 02 0 

ACACTACCTA CTGCCGCAAT GTGATTGTCT TTTAGCAGGA TAG CAT C T G A TAAATTAAAG 7080 

CGATGATTAT AGCCACCGCC AACTCTCACG GCATATTTCT CAAAAAGACG TAAATTAGGA 714 0 

GTAGTTTTTC GAGTATCAAA TACCTTAATG CAATCATCGC CTAAGGCTTC TACATAAGCA 72 00 

GCTGTCATCG AAGCAATCCC TGATAAATGT TGTAAAAAAT TCAAGGCAAC GCGTTCACAT 7 2 60 

GTTAAGAGAC TTCTCACCGA GCCTATGATT TCTAAAAC CA AATCGCCACT AGTCAAACGA 732 0 

TCCCCATCCT TAAATTGATG AGGATTCTGG AAGGTCACCT CGGCATCAAA TAGGGTAAAA 73 80 

ACCCTTTGAA AAACGGTTAG CCCCGCTAAA ACACCAGCTT CCTTGGCAAA AAGCG AC AC C 744 0 

TTGGCTTGGC C ATGATG AT C AAAAATGGCA TTGGTACTGT AATCTTCGGA ATGAACATCT 7 5 00 

TCTCGCAAGG CTGCTTTCAA TGTATCATCT ATTTGAAAAG GGGTTAAATC AGTTGAAATG 7560 

ATTGACATCA C 7 571 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 63 85 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS : double 
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<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTTGCTAGTG GCTTAAATTC TTCAGGAAAA TCAGGCGTAT CTAAAAGTCG TGTCGTTTTT 6 0 

GTTTCATCTA TATAAAGACT TCCTGCTCCC CCTACAACTA GAAAACGTGT CTGTGTTCCA 12 0 

GCAAGAAGCT GATTAAATAG TTCGATTGAT TTGCTGTGGA GCGGTAGCGT ATCTGGTGTA 180 

TAAGCACCAA ACGCTGAAAT AACAGCATCA AATCCAGTAA GATCATCTTT TGTCAACTCA 2 40 
AATAAATCTT TTTTAATAAT AGACTCAGCT TGACTTTTGT TTTCAGAACG AACAATAGCC 300 
GTTACTTCAT GTCCTCGTTT GACTGCTTCT TCAACAATTG CTTTCCCCGC TTGTCCATTT 360 

GCTGCAATAA CTGCTAGTTT CATTTTTTAT ACCTCTCTTG TTGTAATTAT TTTAGTTACA 420 

GAAATTGTGA C ACT CTT AAT AATCAATGTC AATAGTCTTG CTTAATTATT ATCAAAATAT 4 80 

TTCTACCAAG AAAACTAACC ATGATTCTAG TGAAAAAAAA TCTTCTTTGT CAACAAATTT 54 0 
ACTTTCTTGT TTTAAACATG C TAT AAT AAT CATAGCAAGA GATCTAAGTT GTCTGTTTTT 600 

TTAAAACGAG GTGATTATCA TGCGTAGATT CTATTCCCAT CTCCCCTACT ATCTGGTCAT 6 60 

ATTATTCTTT TATTGGCCAC TTTATGAGTT GTTCTTACTA GTTGTTTCTG ACCCCCTTAC 72 0 

AC TC AAGGG A CTCTATATAA ACAATCTTCT CTTCTTTACA CCTCTGGTAA TCTTGATTGT 780 

ATCGTTACTC T AT AG CTAC C GTTTCCGTTT CTCACTTTGA TGGTTAGTTG GTAACGGACT 84 0 

GCTCTTTTAC TT T ACT ATC A TAACCTTTGG TGAGTTTATA CTAATTTACT TGCTAATCTA 900 

TGAAACAGTT GCTCTGGTCG GCATGGATTC TGGTATTAGC ATCAAGCATA TTCTACAAAA 960 

AATGAAAAAC AAAAAACTTT CACAAAATCC TTGAAAAATC T C AC AATC AT GCTATAATAA 102 0 

TC CAT AG AG A CAAGTCACTT AGTCCCTTTC TACTAGAGAG TGCGTGGTTG CTGGAAACGC 108 0 

ATAGGAAGTC TAAACTGATA CTACTCTTGA GTTTTTTATG AAAACATAAA ACGGTGGCCA 1140 

CGTTAGAGCC GATCAGAGGT GTCCCTCTCT TTTGAGGTAC ATAAATGAAG GTGGAACCAC 1200 

GTTGCGACGT CCTTTCGAGG ATGTCGCATT TTTTTATTAG GATACTAATT ATGGAGTTGC 12 60 

AAGAATTAGT GGAGCGCAGT TGGGCAATCC GACAAGCTTA TCACGAACTG GAAGTTAAGC 1320 

ATCATGATTC CAAGTGGACG GTAGAAGAAG ACCTCTTGGC TTTATCTAAT GATATTGGAA 13 80 

ATTTCCAACG ACTGGTGATG ACAAAGCAAG GACGCTACTA TGATGAAACA CCCTACACAC 1440 

TGGAACAAAA ACTTTCAGAA AATATCTGGT GGCTATTAGA ACTTTCTCAA CGTTTGGATA 1500 

TAGACATTCT GACGGAAATG GAAAACTTCC TCTCTGATAA AGAAAAGCAA TTGAACGTTA 15 60 

GGACTTGGAA GTAGTCTGCT GATAAAAAAT CAATGCTTAG AAACTATGAA ATAATAAAAA 1620 
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AGGAGAACAT 


CATGATTAAC 


ATTACTTTCC 


CAGATGGCGC 


TGTTCGTGAA 


TTCGAATCTG 


1680 


GCGTAACAAC 


TTTTGAAATT 


GCCCAATCTA 


TCAGCAATTC 


CCTAGCTAAA 


AAAGCCTTGG 


1740 


CTGGTAAATT 


CAACGGCAAA 


CTCATCGACA 


CTACTCGCGC 


T ATC AC TG AA 


GATGGAAGCA 


1800 


TCGAAATTGT 


GACACCTGAT 


CACGAAGATG 


CCCTTCCAAT 


CTTGCGTCAC 


TCAGCAGCTC 


1860 


ACTTGTTCGC 


CCAAGCAGCT 


CGTCGTCTTT 


TCCCAGACAT 


TCACTTGGGA 


GTTGGTCCAG 


1920 


C CAT CG AAG A 


TGGTTTCTAC 


TACGATACTG 


AC AAC AC AG C 


TGGTCAAATC 


TCTAACGAAG 


1980 


ACCTTCCTCG 


TATCGAAGAA 


GAAAT GC AAA 


AAATCGTCAA 


AGAAAACTTC 


CCATCTATTC 


2040 


GTGAAGAAGT 


GACTAAAGAC 


GAGGCACGTG 


AAATCTTCAA 


AAATG AC C CT 


TACAAGTTGG 


2100 


AATTGATTGA 


AG AAC AC TC A 


GAAGACGAAG 


GCGGTTTGAC 


TATCTATCGT 


CAGGGTGAAT 


2160 


ATGTAGACCT 


CTGCCGTGGA 


CCTCACGTTC 


CATCAACAGG 


TCGTATCCAA 


ATCTTCCACC 


2220 


TTCTCCATGT 


AGCTGGTGCG 


TACTGGCGTG 


GAAACAGCGA 


CAACGCTATG 


ATGCAACGTA 


2280 


TCTACGGTAC 


AGCTTGGTTT 


GACAAGAAAG 


ACTTGAAAAA 


CTACCTTCAA 


AT GCGTG AAG 


2340 


AAGCTAAGGA 


ACGTGACCAC 


CGTAAACTTG 


GTAAAGAGCT 


TGACCTCTTT 


ATGATTTCAC 


2400 


AAGAAGTGGG 


ACAAGGTTTG 


CCATTCTGGT 


TGCCAAATGG 


TGCGACTATC 


CGTCGTGAAT 


2460 


TGGAACGCTA 


CATCGTAAAC 


AAAGAGTTGG 


TTTCTGGCTA 


CCAACACGTC 


TACACTCCAC 


2520 


CACTTGCTTC 


TGTTGAGCTT 


T AC AAG AC T T 


CTGGTCACTG 


GGATCATTAC 


CAAGAAGACA 


2580 


TGTTCCCAAC 


CATGGACATG 


GGTGACGGGG 


AAGAATTTGT 


CCTTCGTCCA 


ATGAACTGTC 


2640 


CGCACCACAT 


CCAAGTTTTC 


AAACACCATG 


TTCACTCTTA 


CCGTGAATTG 


CCAATCCGTA 


2700 


TCGCTGAAAT 


CGGTATGATG 


CACCGTTACG 


AAAAATCTGG 


TGCCCTCACT 


GGCCTTCAAC 


2760 


GTGTACGTGA 


AATGTCACTC 


AACGACGGTC 


AC CT AT T C GT 


TACTCCAGAA 


CAAATCCAAG 


2820 


AAGAATTCCA 


ACGTGCCCTT 


CAGTTGATTA 


TCGATGTTTA 


TGAAGACTTC 


AACTTGACTG 


2880 


ACTACCGCTT 


CCGCCTCTCT 


CTTCGTGACC 


CT C AAG AT AC 


TCATAAGTAC 


TTTGATAACG 


2940 


ATGAGATGTG 


GGAAAATGCC 


CAAACCATGC 


TTCGTGCAGC 


TCTTGATGAA 


ATGGGCGTGG 


3000 


ACTACTTTGA 


AGCCGAAGGT 


GAAGCAGCCT 


TCTACGGACC 


AAAATTGGAT 


AT C C AG ATT A 


3060 


AAACTGCCCT 


TGGAAAAGAA 


GAAACCCTTT 


C T ACT ATC C A 


ACTTGATTTC 


TTGTTGCCAG 


3120 


AACGCTTCGA 


CCTCAAATAC 


ATCGGAGCTG 


ATGGCGAAGA 


TCACCGTCCA 


GTCATGATCC 


3180 


ACCGTGGGGT 


TATCTCAACT 


ATGGAACGCT 


TCACAGCTAT 


CTTGATTGAG 


AACTACAAGG 


3240 


GGGCCTTCCC 


AACATGGCTG 


GCACCACACC 


AAGTAACCCT 


CATCCCAGTA 


TCTAACGAAA 


3300 


AACACGTGGA 


CTACGCTTGG 


GAAGTGGCCA 


AG AAACT C CG 


TGACCGCGGT 


GTCCGTGCAG 


3360 
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ACGTAGATGA GCGCAATGAA AAAATGCAGT TCAAGATCCG TGCTTCACAA ACCAGCAAGA 3 42 0 

TTCCTTACCA ATTAATTGTT GGAGACAAAG AAATGGAAGA CGAAACAGTC AACGTTCGTC 348 0 

GCTACGGCCA AAAAGAAACA CAAACTGTCT CAGTTGATAA TTTTGTTCAA GCTATCCTAG 3540 

CTGATATCGC CAACAAATCA CGCGTTGAGA AATAAGAGTC TAGCATAAAA GCCTCCAATC 3 6 00 

TGGAGGCTTT TTCTCATCTA TTTTTACTCA AGGACTAAGT TCACTTGAGC AAACTGAATC 3 660 

CGCACTGTCG TTCCTTTTCC GACCTCAGAC TCGATACGAA TCTGGTGCCC CAGTTCTTCA 372 0 

GAAATTTTCT TAGATAGATA AAGGCCAAGT CCAGAGGACT GCTGGGTCAA ACGGCCATTG 37 80 

TATCCTGAAA AGCCACGTTC AAATACTCGG AGGACATCAC TGTTTTTTAT CCCGATTCCC 3840 

GTATCTTTGA TACAAAGCTC TTGGTCATCC ATATAAATCT CCAGACCACC TTCCTTGGTG 3900 

TACTTGAGAC TGTT TGAGAT GATTTGCTCA ATAACCACTA GCAGCCACTT TTTATCCGTC 3 960 

ACGATTTCTT TAT C AAGGTC ATGTAGATTG ACATTTAAGC CTTTTTGAAT AAAGAAAAGA 4 020 

GCATATTTAC GAATTATTTC CTTGACCAAG TCCTCAATTT GAACCTGCTT T AAG AC C AAA 4080 

TCATCATGGA AACTTTCTAA ACGCAGGTAC TGTAAAACTA GGTTGGTATA GGAGTCGATT 4140 

TTGAAAATTT CCTGTTCTAG CTGCTGCTTC AGTTGGCGGT CGACCACTTC TGCAACTAAG 42 00 

AGTTGACTGG CTGCAATGGG GGTCTTTATC TGATGGACCC ACAAGGTATA GTAATCCAGC 42 60 

AAATCCGTCA GTTTTCTTTC TGCTTTTGAC CTCTGCTGAT AGAGTTCCAT CTCACGCGCT 4320 

TCTAATTTTT CTGCTAAAGC T ATTT CC AAA GGAGACTTGG CTTCCCTCTC T C CAT AG AG A 43 80 

AGTTCCTGGC GATAGACCTG CGTTTCCACC AATATGTCCC AAGTGAAAAA TAATATGGTT 4440 

ACAAAGCAAC ACAAGAAGAA AAAGTAGAGG AAGTAAATTC CTAGACTGGC AAATAAAAAC 4 500 

TGAAAGAGTA AGACAAGAAA TGCCAAAGAA AGCAGATAGA TAAAAAGACG ACTACGGGAG 45 60 

CGCAGATAGG CTAGAAAAAA TTGTTTCCAA TCAAGCATGC TTCAATCCGT ACCCTATTCC 4 620 

TTTCTTGGTC TCGATAAATC CTACCAATCC CTGCTCCTCC AACTTTTTAC GCAAACGAGC 4 680 

CACATTGACA GAGAGGGTAT TAT CATC AAT GAAAAAGTCA CTGTTCCAAA GTTCCCGCAT 474 0 

CAGGTCGTCA CGTGCTACGA TGTTGCCTGC ATGCTCAAAT AACACGCGTA AAATCTGGAA 4 800 

TTCATTCTTG GTCAAATTCA AGACTTGCCC TTGATAATGT AAATCCATGG ATTTGGTATT 48 60 

GAGGATAACA CCAGCATATT CCAGCAAACT CTCATCACGC CC AAACT C AT AGGAACGACG 4920 

CAACAAGCCC TGAACCTTAG CTAAAAGAAC CTGCTGGTCA AAAGGCTTGG T C AC AAAGTC 4980 

ATCCGCCCCC ATATTGATTG CCATGACAAT ATCCATAGCC TGGTCTCTCG AAGAAAGAAA 5040 

CATGATAGGT AC CTTGG AAA TCTTGCGGAT TTCCTGACAC CAGTGATAAC CATTAAACAA 5100 

GGGCAAACCA ATATCCATGA GGACCAGATG AGGTTCCGAC TGAACAAATA GACTCAAAAC 5160 



WO 98/18931 



PCT/US97/19588 



161 



TTCCATAAAG 


TCTTCTACCA 


GGACCACTTC 


AAATC C C CAT 


T C AG AG AG C A 


TTTTCCCAAT 


5220 


CTGTTGACGA 


ATGACCTGAT 


CATCTTCTAT 


TAATAAAATC 


TTGTGCATGC 


GCTTCTCCTT 


5280 


TTCCATTATT 


ATAACAGATT 


TTTCCATGCT 


AGATGGTCTG 


AAACTGAATT 


TGAAATAGCC 


5340 


TGTTTTTAGC 


CAGTACAAAC 


AGGCTATGCT 


ACTAGCTAAT 


T TG AG GG AAA 


TTTGCTAAGA 


5400 


TAAATAAAAA 


GAAAGGAGCT 


CTTATGGCCA 


ATATTTTTGA 


CTATCTGAAA 


GATGTCGCAT 


5460 


ATGATTCTTA 


TTACGACCTT 


CCCTTGAATG 


AGTTAGACAT 


TCTAACCTTA 


AT AGAAAT C A 


5520 


CCTACCTCTC 


CTTTGATAAT 


CTGGTCTCCA 


CACTTCCTCA 


ACGTCTTTTA 


GATCTAGCAC 


5580 


CTCAGGTTCC 


AAGAGATCCC 


ACCATGCTTA 


CTAGCAAAAA 


TCGCCTTCAA 


T TAT TAG ATG 


5640 


AATTGGCTCA 


AC AC AAGC GC 


TTCAAAAATT 


GCAAACTCTC 


CCATTTTATC 


AACGACATCG 


5700 


ACCCTGAACT 


GCAAAAGCAA 


TTTGCGGCTA 


TGACTTATCG 


TGTCAGCCTC 


G AT ACCT AT C 


5760 


TGATTGTCTT 


TCGTGGGACA 


G ATG AC AG T A 


TCATTGGCTG 


GAAGGAAGAT 


TTCCACCTGA 


5820 


CCTATATGAA 


GGAAATTCCT 


GCTCAAAAGC 


ACGCCCTTCG 


CTATTTAAAG 


AACTTTTTTG 


5880 


CCCATCATCC 


TAAGCAAAAG 


GTTATTCTAG 


CTGGGCATTC 


CAAGGGAGGA 


AATCTCGCTA 


5940 


TCTATGCTGC 


TAG C C AAATT 


GAG C AAAGTT 


TGCAAAATCA 


GATCACAGCA 


GTTTATACAT 


6000 


TTGATGCACC 


TGGTCTCCAT 


CAAGAATTGA 


CACAGACTGC 


GGGTTATCAA 


AGGATAATGG 


6060 


ATAGAAGCAA 


GATATTCATT 


CCACAAGGTT 


CCATTATCGG 


TATGATGCTG 


GAAATTCCTG 


6120 


CTCACCAAAT 


CATCGTTCAG 


AGTACTGCCC 


TGGGTGGCAT 


CGCCCAGCAC 


GATACCTTTA 


6180 


GTTGGCAGAT 


TGAGGACAAG 


CACTTCGTCC 


AACTGGATAA 


GACCAACAGT 


GATAGCCAGC 


6240 


AAGTAGACAC 


AACCTTTAAA 


GAATGGGTGG 


CCACAGTCCC 


TGACGAAGAA 


CTTCAGCTCT 


6300 


ACTTCGACCT 


CTTCTTTGGC 


ACTATTCTTG 


ATGCTGGTAT 


TAGCTCTATC 


AATGACTTGG 


6360 


CTTCCTTAAA 


GGCGCTTGAA 


T AC ATT CATC 


ATCTCTTTGT 


CCAAGCTCAA 


TCCCTCACTC 


6420 


CAGAAGAAAG 


AGAAACCTTG 


GGTCGCCTTA 


CCCAGTTATT 


GATTGATACT 


CGTTACCAGG 


6480 


CATGGAAAAA 


TAGATAATAC 


TCTTGAAAAT 


TAAATGTATA 


CAAAACAAAA 


GACCTAGAAT 


6540 


ACATACTTTC 


ATGTGCATTC 


TAAGTCTTTT 


TAAATAGAAT 


CTAATAGTCA 


ATAAAAATCA 


6600 


AAGAGCATTG 


AGAGATAATG 


GGGCTTGGAA 


CGTCCCTCTC 


GCTTCAACAA 


AATGACCCCA 


6660 


TTATAGATTA 


AAAAGATGCC 


ACTTAGAAAA 


AGCAAAAAAG 


GAAGTAAGAC 


AAAGGCAAAT 


6720 


ATATAAAAAG 


CTAACTGAAC 


ATTCTCGTAT 


CCATTTTTAT 


AAAAAAGGTA 


GGATAGATAA 


6780 


AAATAACTTG 


AAATGAGGGA 


TAATAAAAAT 


AATACTGGAT 


TCCACAAACT 


TCTATTATCC 


6840 


TTCCAAAATG 


ACACTATAAA 


GGCTAATACA 


ATTCCTATAA 


CGAGATACAT 


TTCTTACTCC 


6900 



WO 98/18931 



PCT/US97/19588 



162 

TTTAATAGCT ACATTTTATC ATAATTATCC AAAGAAAAAA GAGGGCATTT ATCCCTCTTA 6960 

ATCCTTCATC TGACTCTCTG CATCGGCCAC GACTTTTTCT AGACTGGTTT GACCAAGTTC 7 02 0 

TGCCTCCATA GTCAACTGAA TTCTCTCCAA TTTTTGATCC AAAACATCAT GAATATGAGC 7 080 

TCCTACAGGG CAATTTGGAT TCGGATTGTC ATGGAAACTG AAGAGTTGAC CTGTCTTACC 714 0 

AAGACATTCG ACCGCCTGAT AAACATCTAA AAGACTAATA TCCTTAAGGT CCTTGACAAT 72 00 

CTCTGTTCCG CCGGTTCCAC GCGCTACTGA AATCAGCTCT GCCTTCTTCA ACTGGGACAA 72 60 

GATCTTTCTG ATAATGACAG GATTGACCCC GACACTAGCA GCCAGAAAAT CACTGGTCAC 7 320 

CTTGCTTTCC TTCCCCTCGA GGGCAATGAT TATCAGCATA . TGAGTCGCAA TGGTAAATCT 73 8 0 

ACTTGGAATT TGCATCCTCT TCTCCTTTTT ACGAGGCTAC CCTGCCTCTA CTCTTCTTTT 74 40 

T C T ATT AT T A TACCCTTTTT AGTTGTAATG TCAATCGTTA CCACTTTTCA ACCAGTCGTC 7 500 

TAACTCCCGA TCGCAGCCCT CTTTCTGAGC CAATTCTCTC AAAAATTCCT GATGATGAGT 7560 

ATGGTGGATC CCATTGACCA G ACT T TC AT A GTAAACCTCA AAATAGGGAA GTCTCAGGTC 7 62 0 

TTTAGCCAGC TGCAATTCAG CTGCTACATC GTAGTCTACC CGTCGGAAGT C C AT ATCT AC 7 680 

CAGGCCTTTG TCATCAAACT CCAAAATCAT AT AC TGGG C C CGCAAGTCCT TCCGTAGCTG 774 0 

AGCGTCCAAA AAGAAAGGTT GGCCAATCGA ACCCGGATTG ACAATCAATT GCCCACCAGT 7 8 00 

CCCGTAACGA AGCAACTGCT GGTGAATATG TCCATAAACA GCAATATCAC AGGGAGGATG 7860 

AGTCACCAAG CGGTCAAACT CCTCTTGTTT GCCAGTATGA ATCAACTCTC GCCCCCAGTT 792 0 

CTTATCAGGC AGATGATGGC TAATTCCCAC CGTCAAATCC CCAAACTGAC GATGAATTTG 7 9 80 

AAGAGGTTGA T TGTGGAG C A CTTCAATTTC TTCTAGGGAA ATTTCCTCTA AAAC AT AC T G 804 0 

GCACTGGCGC AAGAG AT AG C GTTGACTGGG GCGAGTACTG TCCAATTCCT TACGGACACC 8100 

ATGCCAAAGA CTGTCTTCCC AGTTTCCCAA AACTCTAGCC GTAATCGGTA GTTGATCCAA 8160 

CAAGTCCAAA ATCCTTCTAC GCCCTGTCCC TGGCATGAGA ATATCTCCCA AAAGCCAGTA 82 2 0 

TTCATCCACT CCTATCTGCC GAGCATCTGC CAAAACAGCC TCCAAGGCGG TGGTATTTCC 82 80 

ATGAATATCT GAAAGAAGAG CTATTTTCGT CATATCCATC TCCTCGTTTT TTCTCTTGCA 334 0 

ATAAGTATAA CATAAAAAGT CACAGCTAGA GAAATCTAGC TTTTTTTGAT ATACTAGATA 84 00 

AAGATATTAG ACAAGAGGAA ACGAATGACC CCAAACAAAG AAGACTATCT AAAATGTATT 846 0 

TATGAAATTG GCATAGACCT GCATAAGATT ACCAACAAGG AAATTGCGGC TCGCATGCAA 852 0 

GTCTCTCCCC CTGCCGTAAC TGAAATGATC AAACGAATGA AAAGTGAAAA TCTCATCCTA 8580 

AAGGACAAGG AATGTGGCTA TCT ACT G AC T GACCTCGGTC TCAAACTGGT CTCTGAGCTC 8640 

TATCGTAAGC ACCGCTTGAT TGAAGTTTTT CTAGTTCATC ATTTAGACTA TACAAGTGAC 87 00 
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CAGATTCACG AGGAAGCTGA GGTCTTGGAA CACACTGTCT CTGACCTGTT CGTGGAAAGA 87 60 

CTAGATAAAC TGCTAGGTTT CCCTAAAACC TGCCCCCACG GGGGAACTAT TCCTGCCAAG 882 0 

GGAGAACTAC TCGTTGAAAT CAATAACCTC CCACTAGCTG AT AT C AAGG A AGCTGGCGCC 888 0 

TACCGCCTGA CTCGGGTGCA CGATAGTTTT GACATTCTCC ATTATCTGGA CAAGCACTCA 8940 

CTTCACATCG GTGACCAGCT CCAAGTCAAG CAGTTTGATG GCTTCAGCAA TACCTTCACT 9000 

ATCCTCAGTA ACGACGAGGA TTTACAAGTG AATATGGACA TTGCAAAACA ACTCTATGTC 9060 

G AGAAAAT C A ACTAATTTCT CAAGTCCCCT ACCAACCCTG AAAGTTTTAT TTTGGCTCTT 912 0 

TGTCAACTGT AGTGGGTTGA AGTCAGCTAA G CT CG AG AAA GGACAAATTT TGTCCTTTCT 9180 

TTTTTGATAT TCAGAGCGAT AAAAATCCGT TTTTTGAAGT TTTCAAAGTT CCGAAAACCA 92 4 0 

AAGG CAT TGC GCTTGATAAG TTTGATGAGA TTATTGGTCG CTTCCAGTTT GGCATTAGAA 93 00 

TAGTGTAGTT GAAGGGCGTT GACAATCTTT TCTTTATCTT TGAGGAAGGT TTTAAAGACA 9 3 60 

GTCTGAAAAA TAGGATGAAC CTGCTTTAGA TTGTCCTCAA TGAGTCCGAA AAATTTCTCC 942 0 

GGTTTCTTAT TCTGAAAGTG AAACAGCAAG AGTTGATAGA GCTGATAGTG GTGTTTCAAG 9480 

TCTTGTGAAT AGCTCAAAAG CTTGTCTAAA ATCTCTTTAT TGGTTAAGTG CATACGAAAA 954 0 

GTAGGACGAT AAAATCGCTT ATCACTCAGT TTACGGCTAT CCTGTTGTAT GAGCTTCCAG 9 600 

TAGCGCTTGA TAGCCTTGTA TTCATGGGAT TTTCGATCCA ATTGGTTCAT AATTTGAACA 9 66 0 

CGCACACGAC TCATAGCACG GCTAAGATGT TGTACAATGT GAAAGCGATC CAACACGATT 972 0 

TTAGCATTCG GGAGTGAAAC AGTCTGGGAG ACTGTTTCAG CCTGAGCCTA GAAATTTGAA 97 80 

AGCGAAGCTG TTTAGCCAAG TCATAGTAAG G AC T AAAC AT AT C CAT C GT A ATGATTTTCA 9840 

CTTGACAACG AACGGCTCTA TCGTAGCGAA GAAAGTGATT TCGGATGACA GCTTGTGTTC 9900 

TGCCTTCAAG AACAGTGATA ATATTAAGAT TATCAAAATC TTGCGCAATG AAACTCATCT 99 60 

TTCCCTTAGT GAAGGCATAC TCATCCCAAG ACATAATCTT TGGAAGCCGA G AAAAAT C AT 10020 

GCTCAAAGTG AAAGTCATTG AGCTTGCGAA TGACAGTTGA AGTTGAAATG GCCAGCTGAT 10 080 

GGGCAATATC AGTCATAGAA ATTTTTTCAA TTAACTTTTG AGCAATyTTT TGGTTGATGA 10140 

TACGAGGGAT TTGGTGATTT TTCTTTACCA GGGGAGTCTC AGCAACCATC ATTTTTGAAC 10200 

AGTGATAGCA CTTGAAACGA CGCTTTCTAA GGAGAATTCT AGAAGGCATA CCAGTCGTTT 10260 

CAAGATAAGG AATTTTAGAA GGTTTTTGAA AGTCATATTT CTTCAATTGG TTTCCGCACT 10320 

CAGGGCAAGA TGGGGCGTCG TAGTCCAGTT TGGCGATGAT TTCCTTGTGT GTATCCTTAT 10380 

TGATGATGTC TAAAATCTGG ATATTAGGGT CTTTAATGTC TAGTAATTTT GTGATAAAAT 10440 
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GTAATTGTTC CAT AT G ATT C TTTCTAATGA GTTGTTTTGT CGCTTTTCAT TATAGGTCAT 105 00 

ATGGGACTTT TTTTCTACAA TAAAATAGGC TCCATAATAT CTATAGTGGA TTTACCCACT 105 60 

ACAAATATTA TAGAACCGTA AAAATAGAAG GAGATAGCAG GTTTTCAAGC CTGCTATCTT 10 620 

TTTTTGATGA CATTCAGGCT GATACGAAAT CATAAGAGGT CTGAAACTAC TTTCAGAGTA 106 80 

GTCTGTTCTA TAAAATATAG TAGATTGAAA TAAGATGTGA ACAACTCTAT CAGGAAAGTC 10740 

AAATTAATTT ATAGAATTAT TTTAGCAGTC AAGGTGTACT GT TAT AG ATT CAATATATTA 10800 

TATGACTATT AACCTTGTCT TCTCCTAAAA TTGACTTTCT TGTTTTCTTA TCTTGTCCAC 108 60 

TCGAAACAAG TATTGTAAGA ATTTGATTAT TTTTGAAAGT ACTTTTAATA TACTTGATAT 10920 

AGTTAAAAAA GATTTGAAAC TAAATTCCAA ATTAGAAAAA GACTTGAAAT ACTAAAAAAA 10980 

AAAAAGTATA CTCTAATTGA AAACGGTAAC AAAACTAATT TAGAGAATGA AATATAGAGT 11040 

ATTTCTCTCT TAAAAGTTTT TGGTGAAACG AGATGTAGAA AGGAGATTTA GCCAAAGAGT 11100 

CTATTAGTGC TAGAATAATA GATTAGAATT ATTTTAGAAA AACGAAGTGA GCAGCTTATA 11160 

AATTCAAGTC CCCAAATAGA T T CAT AC TAG TATCTTTTGC AAAAAATAAA GGGCGACTTC 11220 

CTTCATGAAT ATCAATTTCA TCTATAAGGA AGGTAGCTAA TTGAACTAAC TTATTTATTC 112 80 

TGTTTGTCGC TAGAAAAATC AGACCTCCTT GTGAAGATTG AGGAGATACT TAATGAAAAT 1134 0 

CAAAGAAGAA ACTAGCAAGC TAGTAGCAGA TTGCCCAAAA CACCGCTTTG AGGTTGTAGA 11400 

TAAGACTGAC CTATATAATC CAAGGTGAAG CGACTGTGGT TTGAAGAGAT TTTCAAAGAG 114 60 

TATAGGCTAG AGAGTAGTGT TTTTATGTCC TTCTAGTAGA AAATGCTAGA CAGAAGAATG 11520 

GGGAACTTGG ATAGGAAAAA TAGATTGAGA AAGGAGGTTA G AAG AG AT G A TT ATT AC AAA 11580 

AATTAGCCGT TTAGGAACTT ATGTGGGAGT AAAT C C AC AT TTTGCAACAT TAATAGATTT 11640 

TCTAGAAAAA ACAGGACTAG AAAATTTAAC AGAAGGTTCG ATTGCTATCG ATGGTAATCG 11700 

ATTGTTTGGG AATTGCTTTA CTTATCTAGC AGATGGTCAA GCAGGGGCTT TCTTTGAAAC 117 60 

CCACCAAAAA TATTTGGATA TTCATTTAGT TTTGGAAAAC GAAGAAGCCA TGGCTGTTAC 11820 

ATCGCCGGAA AATGTAAGCG TTACCCAAGA ATATGATGAA GAGAAAGATA TTGAATTATA 118 8 0 

CACAGGGAAA GTGGAACAGT TGGTTCATTT GAGAGCTGGC GAATGCCTCA TCACTTTTCC 11940 

AGAAGATTTA CATCAACCCA AGGTTCGTAT AAATGATGAA CCTGTGAAAA AAGTTGTCTT 12 000 

TAAAGTTGCG ATTTCTTAAT GTAGAAAGAG AAG AACGAT G AAAAAAATGA GAAAGTTTTT 12 0 60 

ATGTCTAGCT GGAATTGCGC TAGCGGCTGT TGCCTTGGTA GCTTGTTCAG GAAAAAAAGA 12120 

AGCTACAACT AGTACTGAAC CACCAACAGA ATTATCTGGT GAGATTACAA TGTGGCACTC 12180 

CTTTACTCAA GGACCCCGTT TAGAAAGTAT TCAAAAATCA GCAGATGCTT TCATGCAAAA 12240 
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GCATCCAAAA ACGAAAATCA AGATTGAAAC ATTTTCTTGG AATGACTTCT AT AC T AAATG 12300 

GACTACAGGT T TAG C AAAT G GAAATGTGCC AG AT AT C AGT ACAGCTCTTC CTAACCAAGT 12 3 60 

AATGG AAATG GTCAACTCAG ATGCTTTGGT TCCGCTAAAT GATTCTATCA AGCGTATTGG 12 42 0 

ACAAGATAAA TTTAACGAAA CTGCCTTAAA TGAAGCAAAA ATCGGAGATG ATTACTACTC 12480 

TGTTCCTCTT TATTCACATG CACAAGTCAT GTGGGTTAGA AC AG ATTTG T TAAAAGAACA 12540 

TAATATTGAG GTTCCTAAAA CTTGGGATCA ACTCTATGAA GCTTCTAAAA AATTGAAAGA 12 600 

AGCTGGAGTT TATGGCTTGT CTGTTCCGTT TGGAACAAAT GACTTAATGG CAACACGTTT 12 660 

CTTGAACTTC TACGTACGTA rTGGTGGAGG AAGCCTCTTA AC AAAAG AT C TTAAAGCAGA 12720 

CTTGACAAGC CAACTTGCTC AAGATGGTAT T AAAT ACT GG GTTAAATTGT ATAAAGAAAT 12 780 

CTCACCTCAA GATTCTTTGA ACTTTAATGT CCTTCAACAA GCTACCTTGT TCTATCAAGG 12 84 0 

AAAAACAGCA TTTGACTTTA ACTCTGGCTT C C AT AT CGG A GGAATTAATG CCAACAGTCC 12 900 

TCAATTGATT GATTCGATTG ATGCTTATCC TATTCCAAAA ATCAAAGAGT CTGATAAAGA 12 9 60 

CCAAGGAATT GAAACCTCAA ACATTCCAAT GGTTGTTTGG AAAAATTCAA AACATCCAGA 13 020 

AGTTGCTAAA GCATTCTTAG AAGCACTTTA TAATGAAGAA GACTACGTTA AATTCCTTGA 13 080 

TTCAACTCCA GTAGGTATGT TGCCAACTAT T AAGGGG AT T AGCGATTCTG CAGCCTATAA 1314 0 

AGAAAATGAA ACTCGTAAGA AATTTAAACA TGCTGAAGAA GTAATTACTG AAGCTGTTAA 13 2 00 

AAAAGGTACT GCTATTGGTT ATGAAAATGG GCCAAGTGTA CAAGCTGGTA T GT T G ACT AA 13 2 60 

CCAACACATT ATTGAACAAA TGTTCCAAGA TATCATTACA AATGGAACAG ATCCTATGAA 13 32 0 

AGCAGCAAAA GAAGCAGAAA AACAATTAAA TGATTTATTT GAGGCTGTTC AGTAGATGTA 13 3 80 

AAAGACTAGA AAATAGGTGG GATAGTGAGC TGAAAAGCTC TAGCCCAATC TTGTAAAAGA 13440 

AGGGAGAAGG AGAATGGTTA AAGAACGTAA TTTAACTCGC TGG AT ATTTG TTTTGCCAGC 13 500 

TATGATTATC GTAGGATTAC TCTTTGTTTA TCCGTTTTTC TCGAGTATTT TTTATAGCTT 13 560 

TACCAATAAG CATTTGATTA TGCCTAATTA TAAATTTGTT GGTTTGGCTA ACTATAAAGC 13 62 0 

TGTGCTATCA GATCCCAACT TCTTTAATGC GTTCTTTAAT TCAATTAAGT GGACCGTTTT 13 680 

C T CAT T AGTT GGTCAAGTTT TAGTAGGGTT TGTATTGGCT TTAGCTCTTC ACAGAGTACG 13740 

CCACTTCAAG AAATTATATA GGACATTATT GATTGTTCCT TGGGCATTTC CTACCATCGT 13 800 

TATTGCCTTC TCTTGGCAGT GGATTCTAAA CGGGGTTTAT GGCTACTTAC CTAATCTAAT 13860 

CGTAAAATTA GGTTTAATGG AAC AT AC AC C TGCATTTTTG ACAGATAGTA CATGGGCATT 13 92 0 

CCTATGTTTG GTGTTTATCA ACATTTGGTT TGGAGCACCA ATGATTATGG TTAATGTGCT 13 980 
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TTCAGCTTTG CAAACAGTAC CAGAAGAACA ATTTGAGGCT GCTAAGATAG ATGGTGCTTC 14 040 

AAGTTGGCAG GTGTTCAAGT TTATCGTCTT TCCACATATT AAAGTGGTTG TAGGACTTCT 14100 

AGTTGTTTTG AGAACTGTAT GGATCTTTAA TAACTTTGAC ATTATCTACC TCATTACTGG 14160 

TGGTGGACCA GCCAATGCTA CAACGACGCT TCCAATTTTT GCTTACAACC TGGGCTGGGG 14220 

AACTAAATTG TTGGGTCGTG CTTCAGCAGT TACAGTACTG CTCTTTATCT TCTTGGTGGC 142 80 

GATTTGCTTT ATCTACTTTG CTATCATCAG TAAGTGGGAA AAGGAGGGTA GAAAATAATG 14 34 0 

AAGAAGAAAT CCAGTATTTA TTTAGATATT CTCTCACATG TACTTTTAGT TGGTGCGACC 14400 

AT CGTTGC AG TTTTCCCATT GGTATGGATT ATCATATCTT CTGTCAAAGG GAAAGGGGAA 144 6 0 

TTAACTCAGT ATCCAACACG ATTTTGGCCT GAACAGTTTA CATTAGATTA TTTCACTCAT 14 52 0 

GTTATCAACG ATTTGCACTT CATTGATAAC ATTCGAAACA GTTTAATCAT TGCCTTGGCT 14580 

ACAACCCTTA TTGCGATTAT TATTTCTGCT ATGGCAGCCT ATGGTATTGT TCGATTCTTT 14 640 

CCTAAATTGG GAGCAATCAT GTCGAGACTA CTCGTCATTA CCTACATTTT CCCACCAATT 147 00 

TTGTTAGCAA TTCCCTATTC AATTGCCATT GCTAAAGTTG GGTTAACAAA TAGTTTATTT 147 60 

GGCTTGATGA TGGTTTATCT ATCTTTTAGT GTTCCATATG CAGTTTGGCT CTTAGTTGGA 14 82 0 

TTTTTCCAAA CAGTTCCAAT TGGAATTGAA GAAGCGGCTA GAATTGATGG TG C AAAT AAA 14 880 

TTTGTTACGT TTTATAAAGT TGTGCTACCG ATTGTAGCAC CAGGTATTGT AGCAACAGCT 1494 0 

ATTTATACAT TTATCAATGC TTGGAATGAA TTCCTGTATG CCTTGATTTT GATTAACAAT 15000 

ACAGGAAAGA TGACAGTAGC AGTAGCCCTT CGTTCACTTA ATGGTTCAGA AATACTAGAC 15060 

TGGGGAGATA TGATGGCAGC GTCTGTTATT GTAGTTCTTC CATCAATTAT TTTCTTCTCT 1512 0 

ATCATCCAAA ATAAGATTGC AAGTGGATTA TCAGAAGGAT CTGTGAAGTA GACGAAAGAA 15180 

GGAAAAAAAT GAATAAAAGA GGTCTTTATT CAAAACTAGG AATTTCCGTT GTAGGCATTA 15240 

GTCTTTTAAT GGGAGTCCCC ACTTTGATTC ATGCGAATGA ATTAAACTAT GGTCAACTGT 15300 

CCATATCTCC TATTTTTCAA GGAGGTTCAT ATCAACTGAA CAATAAGAGT AT AG AT AT C A 153 60 

GCTCTTTGTT ATTAGATAAA TTGTCTGGAG AG AGT C AG AC AGTAGTAATG AAATTTAAAG 15420 

C AG AT AAAC C AAACTCTCTT CAAGCTTTGT TTGGCCTATC TAATAGTAAA GCAGGCTTTA 15480 

AAAATAATTA CTTTTCAATT TTCATGAGAG ATTCTGGTGA GATAGGTGTA GAAATAAGAG 1554 0 

ACGCCCAAAA GGGAATAAAT TATTTATTTT CCAGACCAGC TTCATTATGG GGAAAACATA 15600 

AAGGACAGGC AGTTGAAAAT ACACTAGTAT TTGTATCTGA TTCTAAAGAT AAAACATACA 15660 

CAATGTATGT TAATGGAATA GAAGTGTTCT CTGAAACAGT TGATACATTT TTGCCAATTT 1572 0 

CAAATATAAA TGGTATAGAT AAGGCAACAC TAGGAGCTGT TAATCGTGAA GGTAAGGAAC 15780 
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ATTACCTCGC AAAAGGAAGT ATTGATGAAA TCAGTCTATT TAACAAAGCA ATTAGTGATC 15840 

AGGAAGTTTC AACTATTCCC TTGTCAAATC CATTTCAGTT AATTTTCCAA T C AGG AG AT T 159 00 

CTACTCAAGC TAACTATTTT AGAATAC CGA CACTATATAC ATTAAGTAGT GGAAGAGTTC 15960 

TATCAAGTAT TGATGCACGT TATGGTGGGA CTCATGATTC TAAAAGTAAG ATTAATATTG 1602 0 

CCACTTCTTA TAGTGATGAT AATGGGAAAA CGTGGAGTGA GCCAATTTTT GCTATGAAGT 1608 0 

TTAATGACTA TGAGGAGCAG TTAGTTTACT GGCCACGAGA TAATAAATTA AAGAATAGTC 16140 

AAATTAGTGG AAGTGCTTCA TTCATAGATT CAT C CAT T GT TGAAGATAAA AAATCTGGGA 16200 

AAACGATATT ACTAGCTGAT GTTATGCCTG CGGGTATTGG AAATAATAAT GCAAATAAAG 16260 

C CG AC TC AGG TTTTAAAGAA ATAAATGGTC ATTATTATTT AAAACTAAAG AAGAATGGAG 16320 

AT AACG ATT T CCGTTATACA GTTAGAGAAA ATGGTGTCGT TTATAATGAA ACAACTAATA 16380 

AAC C T AC AAA TTATACTATA AATGATAAGT ATGAAGTTTT GGAGGGAGGA AAGTCTTTAA 1644 0 

CAGTCGAACA ATATTCGGTT GATTTTGATA GTGGCTCTTT AAGAGAAAGG CATAATGGAA 16500 

AACAGGTTCC TATGAATGTT TTCTACAAAG ATTCGTTATT TAAAGTGACT CCTACTAATT 165 60 

ATATAGCAAT GACAACTAGT CAGAATAGAG GAG AG AGT T G GGAACAATTT AAGTTGTTGC 16 620 

CTCCGTTCTT AGGAGAAAAA CATAATGGAA CTTACTTATG TCCCGGACAA GGTTTAGCAT 16680 

TAAAATCAAG TAACAGATTG ATTTTTGCAA CATATACTAG TGGAGAACTA ACCTATCTCA 16740 

TTTCTGATGA TAGTGGTCAA ACATGGAAGA AATCCTCAGC TTCAATTCCG TTTAAAAATG 16 8 00 

CAACAGCAGA AGCACAAATG GTTGAACTGA GAGATGGTGT GATTAGAACA TTCTTTAGAA 16860 

C C AC T AC AGG TAAGATAGCT TATATGACTA GT AG AG ATT C TGGAGAAACA TGGTCGAAAG 16920 

TTTCGTATAT T G ATGGAAT C CAACAAACTT CATATGGCAC ACAAGTATCT GCAATTAAAT 169 80 

ACTCTCAATT AATTGATGGA AAAGAAGCAG TCATTTTGAG T AC AC C AAAT TCTAGAAGTG 17040 

GCCGCAAGGG AGGCCAATTA GTTGTCGGTT TAGTCAATAA AGAAGATGAT AGTATTGATT 17100 

GGAAAT AC C A CTATGATATT GATTTGCCTT CGTATGGTTA TGCCTATTCT G CG ATT AC AG 17160 

AATTGCCAAA TCATCACATA GGTGTACTGT TTGAAAAATA TGATTCGTGG TCGAGAAATG 1722 0 

AATTGCATTT AAGCAATGTA GTTCAGTATA TAGATTTGGA AATTAATGAT TTAACAAAAT 172 8 0 

AAAGGAGAAA AACATGGTTA AATACGGTGT TGTTGGAACA GGGTATTTTG GAGCTGAATT 17340 

GGCTCGCTAC ATGCAAAAGA ATGATGGAGC AGAGATTACT CTTCTCTATG AT C C AG AT AA 17400 

TGCAGAGGCG AT TGC AG AAG AATTGGGAGC AAAAGTAGCA AGTTCCTTAG ATGAGTTGGT 17460 

TTCTAGCGAT GAAGTAGATT GTGTTATCGT CGCAACTCCA AATAATCTTC ATAAGGAACC 17 52 0 
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GGTTATTAAG 


GCTGCACAGC 


ATGGTAAAAA 


TGTTTTCTGT 


GAAAAACCAA 


TTGCGCTTTC 


17580 


TTATCAAGAT 


TGTCGCGAGA 


TGGTAGATGC 


GTGTAAAGAA 


AACAATGTAA 


CCTTTATGGC 


17640 


AGG AC AT AT T 


ATGAATTTCT 


TTAATGGTGT 


TCATCATGCA 


AAAGAACTCA 


TTAATCAAGG 


17700 


AGTTATCGGA 


GACGTTCTAT 


ATTGTCATAC 


AGCTCGTAAT 


GGTTGGGAAG 


AACAACAACC 


17760 


GTCAGTATCA 


TGGAAAAAAA 


TTCGTGAAAA 


ATCAGGTGGT 


CACTTGTATC 


ACCACATCCA 


17820 


TGAATTGGAT 


TGCGTTCAAT 


TCCTTATGGG 


GGGCATGCCT 


GAAACTGTAA 


CCATGACAGG 


17880 


TGGAAATGTG 


GCCCATGAAG 


GTGAACATTT 


CGGTGATGAA 


GATGATATGA 


TTTTTGTCAA 


17940 


TATGGAATTT 


TCTAATAAGC 


GTTTTGCCTT 


GTTAGAATGG 


GGTTCAGCTT 


ATCGTTGGGG 


18000 


TGAACATTAT 


GTCTTAATCC 


AAGGAAGCAA 


AGGTGCCATC 


CGCTTAGACT 


TATTCAACTG 


18060 


TAAAGGAACT 


CTTAAGCTAG 


ATGGGCAAGA 


AAGCTATTTC 


TTGATTCACG 


AATCGCAAGA 


18120 


AGAAGATGAT 


GATCGGACTC 


GT AT CT AT C A 


TAGTACAGAG 


ATGGATGGAG 


CAATTGCTTA 


18180 


TGGTAAACCA 


GGTAAACGTA 


CTCCATTATG 


GCTATCATCT 


GTCATTGATA 


AAGAAATGCG 


18240 


CTATCTGCAT 


GAGATTATGG 


AAGGAGCTCC 


AGTATCAGAA 


GAATTTGCAA 


AACTTTTGAC 


18300 


AGGTGAAGCT 


GCCCTAGAAG 


CAATTGCTAC 


TGCAGATGCT 


TGTACCCAGT 


CTATGTTTGA 


18360 


AGAT CGC AAA 


GTAAAATTGT 


CAGAAATTGT 


AAAATAAATT 


TTGGTATTCT 


CCTATTTATA 


18420 


GGTCGACTTG 


CTCCTCTGAA 


AGTACTTTTA 


GAGGAGCTGT 


TTGACTTTGC 


TAGTTTTTGA 


18480 


AACTGAAATC 


TATTATACTA 


CAAACTATTG 


AAAGCGTTTT 


AATTTTAAGG 


TATAATAATC 


18540 


TCATAGAAAT 


AAAGAAAAGG 


AGGAAAGAGG 


ATGCCACAGA 


TTAGCAAAGA 


AGCCTTGATT 


18600 


GAGCAAATCA 


AAGATGGAAT 


CATCGTTTCT 


TGTCAGGCTC 


TTCCTCATGA 


ACCGCTTTAT 


18660 


ACAGAAGCGG 


GAGGGGTGAT 


TCCCTTGCTG 


GTCAAAGCGG 


CTGAGCAAGG 


TGGAGCAGTC 


18720 


GGTATCCGAG 


CAAACAGTGT 


T CGCG AT AT C 


AAGGAAATTA 


AGGAAGTCAC 


TAAACTTCCA 


18780 


ATCATTGGGA 


TTATCAAACG 


TGATTATCCA 


CCTCAGGAAC 


CCTTCATCAC 


GGCTACTATG 


18840 


AAAGAAGTTG 


ATGAATTGGC 


AGAACTGGAC 


ATCGAGGTGA 


X X VJ^- 1 L. 1 \J\JA 






CGTGAACGCT 


ACGATGGTTT 


GGAAATTCAA 


GAGTTCATTC 


GTCAGGTTAA 


GGAGAAATAT 


18960 


CCTAATCAGC 


TTTTGATGGC 


TGATACTAGT 


ATCTTCGAAG 


AAGGGCTAGC 


AGCTGTAGAA 


19020 


GCAGGAATTG 


ACTTTGTCGG 


AACAACCTTA 


TCAGGCTACA 


CATC CT AC AG 


TCCAAAAGTA 


19080 


GACGGTCCAG 


ATTTTGAATT 


GATTAAGAAA 


CTCTGTGATG 


CTGGTGTAGA 


TGTCATTGCA 


19140 


GAAGGAAAAA 


TTCATACACC 


AGAACAAGCC 


AAACAAATCC 


TTGAATATGG 


AGTGCGAGGC 


19200 


ATCGTTGTTG 


GTGGCGCCAT 


TACTAGACCA 


AAAG AG AT T A 


CAGAACGCTT 


CGTTGCTAGT 


19260 


CTTAAATAAG 


ATGTGAGGGG 


GAGTTTTATG 


TTTAAAGTTT 


TACAAAAAGT 


TGGAAAAGCT 


19320 
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TTTATGTTAC 


CTATAGCTAT 


ACTTCCTGCA 


GCAGGTCTAC 


TTTTGGGGAT 


TGGTGGTGCA 


19380 


CTTTCAAACC 


CAACCACGAT 


AGCAACTTAT 


CCAATACTAG 


ACAATAGTAT 


TTTTCAATCA 


19440 


ATATTCCAAG 


TAATGAGCTC 


TGCAGGAGAG 


GTTGTATTCA 


GTAATTTGTC 


ACTACTTCTC 


19500 


TGTGTGGGAT 


TATGTATTGG 


CTTAGCGAAA 


CGAGATAAAG 


GAACCGCTGC 


GTTAGCAGGA 


19560 


GTAACTGGTT 


AC T T AGTT AT 


GACTGCAACG 


ATCAAAGCTT 


TGGTAAAACT 


TTTTATGGCA 


19620 


GAAGGATCTG 


CAATTGATAC 


TGGAGTTATT 


GGAGCATTAG 


TTGTCGGAAT 


AGTTGCCGTA 


19680 


TATTTGCACA 


ACCGATATAA 


CAATATTCAA 


TTACCTTCCG 


CTTTAGGATT 


CTTTGGAGGT 


19740 


TCACGCTTCG 


TTCCTATTGT 


TACATCGTTC 


TCTTCTATCT 


TGATTGGCTT 


TGTCTTCTTT 


19800 


GTTATTTGGC 


CACCTTTCCA 


ACAACTTCTT 


GTTTCTACAG 


GTGGATATAT 


TTCTCAGGCG 


19860 


GGTCCAATTG 


GAACTTTTCT 


ATATGGATTT 


TTAATGAGAC 


TTTCTGGAGC 


AGTAGGCTTA 


19920 


CAT CAT AT AA 


TTTACCCTAT 


GTTTTGGTAT 


ACTGAACTTG 


GTGGTGTTGA 


AACTGTTGCA 


19980 


GGACAAACAG 


TGGTTGGAGC 


TCAAAAAATA 


TTTTTTGCTC 


AATTAGCCGA 


TTTGGCCCAT 


20040 


T C T GG ATT AT 


T T AC AG AAGG 


AACAAGGTTT 


TTTGCAGGTC 


GTTTCTCAAC 


AATGATGTTC 


20100 


GGTTTACCGG 


CTGCCTGTTT 


AGCGATGTAC 


CATAGTGTTC 


CTAAAAATCG 


TCGTAAAAAA 


20160 


TACGCGGGTT 


TGTTTTTTGG 


AGTTGCTTTA 


ACATCTTTTA 


TTACCGGTAT 


TACAGAACCA 


20220 


ATTGAATTTA 


TGTTTCTATT 


CGTCAGTCCG 


GTTCTATATG 


TTGTTCACGC 


ATTCCTTGAT 


20280 


GGTGTTAGCT 


TCTTTATTGC 


AGACGTCTTA 


AATATTTCAA 


TAGGAAACAC 


ATTTTCAGGA 


20340 


GGTGTAATCG 


ATTTCACTTT 


ATTTGGAATT 


TTGCAGGGGA 


AC GCT AAG AC 


GAATTGGGTT 


20400 


CTTCAGATTC 


CATTTGGACT 


TATTTGGAGT 


GTTTTGTATT 


AT AT T AT TT T 


TAGATGGTTC 


20460 


ATTACTCAAT 


TCAACGTTCT 


AACGCCAGGG 


CGAGGAGAAG 


AAGT AG ATT C 


TAAAGAAATT 


20520 


TCTGAATCCG 


CAGATTCAAC 


TT C AAAT ACT 


GCAGATTATT 


TAAAACAGGA 


TAGCCTACAA 


20580 


ATTATCAGAG 


CCTTGGGTGG 


ATCAAATAAT 


ATAGAAGATG 


TAGATGCTTG 


TGTGACACGT 


20640 


TTACGTGTAG 


CTGTAAAAGA 


AGTTAATCAA 


GTTGATAAAG 


CACTTTTAAA 


ACAAATTGGT 


20700 


GCAGTTGATG 


TCTTAGAAGT 


GAAGGGTGGC 


ATTCAAGCAA 


TCTATGGAGC 


AAAAGCAATC 


20760 


TTATATAAAA 


ATAGTATTAA 


TGAAATTTTA 


GGTGTAGATG 


ATTAAGTACT 


TACTGACTTA 


20820 


ATAAAAAACA 


GAGGAGAGTG 


ATGGATGAGT 


AGGATGAAAT 


G AAAT CG CAT 


ACAAGAAATA 


20880 


AAGAACTCAT 


TATCCAAGTT 


GGATACGCTT 


ATTACATAGG 


AGAATACAAA 


TGAAATTTAG 


20940 


AAAATTAGCT 


TGTACAGTAC 


TTGCGGGTGC 


TGCGGTTCTT 


GGTCTTGCTG 


CTTGTGGCAA 


21000 


TTCTGGCGGA 


AGTAAAGATG 


CTGCCAAATC 


AGGTGGTGAC 


GGTGCCAAAA 


CAGAAATCAC 


21060 
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TTGGTGGGCA 


TTCCCAGTAT 


TTACCCAAGA 


AAAAACTGGT 


GACGGTGTTG 


GAACTTATGA 


21120 


AAAATCAATC 


ATCGAAGCGT 


TTGAAAAAGC 


AAACCCAGAT 


AT AAAAG T G A 


AATTGGAAAC 


21180 


CATCGACTTC 


AAGTCAGGTC 


CTGAAAAAAT 


CACAACAGCC 


AT C G AAGC AG 


GAACAGCTCC 


21240 


AGACGTACTC 


TTTGATGCAC 


CAGGACGTAT 


CAT C C AAT AC 


GGTAAAAACG 


GTAAATTGGC 


21300 


TGAGTTGAAT 


GACCTCTTCA 


CAGATGAATT 


TGTTAAAGAT 


GTCAACAATG 


AAAACATCGT 


21360 


ACAAGCAAGT 


AAAGCTGGAG 


ACAAGGCTTA 


TATGTATCCG 


ATTAGTTCTG 


CCCCATTCTA 


21420 


CATGGCAATG 


AACAAGAAAA 


TGTTAGAAGA 


TGCTGGAGTA 


GCAAACCTTG 


TAAAAGAAGG 


21480 


TTGGACAACT 


GATGATTTTG 


AAAAAGT AT T 


GAAAGCACTT 


AAAGACAAGG 


GTTACACACC 


21540 


AGGTTCATTG 


TTCAGTTCTG 


GTCAAGGGGG 


AGACCAAGGA 


ACACGTGCCT 


TTATCTCTAA 


21600 


CCTTTATAGC 


GGTTCTGTAA 


CAGATGAAAA 


AGTTAGCAAA 


TAT AC AAC TG 


ATGATCCTAA 


21660 


ATTCGTCAAA 


GGTCTTGAAA 


AAGCAACTAG 


CTGGATTAAA 


GACAATTTGA 


TCAATAATGG 


21720 


TTCACAATTT 


GACGGTGGGG 


CAGATATCCA 


AAACTTTGCC 


AACGGTCAAA 


CATCTTACAC 


21780 


AATCCTTTGG 


GCACCAGCTC 


AAAATGGTAT 


CCAAGCTAAA 


CTTTTAGAAG 


CAAGTAAGGT 


21840 


AGAAGTGGTA 


GAAGTACCAT 


TCCCATCAGA 


CGAAGGTAAG 


CCAGCTCTTG 


AGTACCTTGT 


21900 


AAACGGGTTT 


GCAGTATTCA 


ACAATAAAGA 


CGACAAGAAA 


GTCGCTGCAT 


CTAAGAAATT 


21960 


CATCCAGTTT 


ATCGCAGATG 


ACAAGGAGTG 


GGGACCTAAA 


GACGTAGTTC 


GTACAGGTGC 


22020 


TTTCCCAGTC 


CGTACTTCAT 


TTGGAAAACT 


TTATGAAGAC 


AAACGCATGG 


AAACAATCAG 


22080 


CGGCTGGACT 


CAATACTACT 


CACCATACTA 


CAACACTATT 


GATGGATTTG 


C T G AAATG AG 


22140 


AACACTTTGG 


TTCCCAATGT 


TGCAATCTGT 


ATCAAATGGT 


GACGAAAAAC 


CAGCAGATGC 


22200 


TTTGAAAGCC 


TTCACTGAAA 


AAGCGAACGA 


AAC AAT C AAA 


AAAGCTATGA 


AACAATAGTC 


22260 


CTTAGTTATT 


C T AT AAAAAG 


TAGTTTTTTA 


AAGAACCTAA 


GAGTGTATAC 


CCCCTTTTCC 


22320 


C T CT AC AC AG 


ATAGTGTAAG 


AAAAGGGGGC 


TTTTGTTTAA 


AATGTAAGAA 


ACTGTCACGA 


22380 


AATTAAAATG 


AAGTTCTTAC 


ATAAGCGAAT 


CATAAAAAAT 


TTCATTTTGA 


TTTTAAAACA 


22440 


GTTCAAGAAA 


GTCAAAAAAT 


TATTCTATTT 


GAAAGAGAGG 


TGCCGACTGT 


GAAAGTCAAT 


22500 


AAAATCCGTA 


TGCGGGAAAC 


AGTGATTTCC 


TACGCTTTCC 


TAGCACCAGT 


ATTATTCTTC 


22560 


TTTGTCATCT 


TTGTGTTGGC 


TCCGATGGTG 


ATGGGCTTCA 


TTACAAGTTT 


CTTTAACTAC 


22620 


TCAATGACTA 


AATTTGAGTT 


TGTAGGCTTG 


GATAACTATA 


TCCGTATGTT 


TAAAGATCCT 


22680 


GTCTTTACAA 


AATCTCTGAT 


TAACACAGTT 


ATTTTGGTTA 


TTGGATCTGT 


ACCAGTTGTT 


22740 


GTTCTATTCT 


CACTCTTTGT 


AGCATCTCAG 


ACC T AT CAT C 


AAAATGTCAT 


TGCCAGATCC 


22800 


TTCTACCGTT 


TCGTCTTCTT 


CCTTCCTGTT 


GTAACGGGTA 


GTGTTGCCGT 


GACAGTTGTT 


22860 
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TGGAAATGGA 


TTTATGACCC 


ACTATCAGGG 


ATTCTAAACT 


TTGTCCTTAA 


GTCCAGCCAC 


22920 


ATCATCAGCC 


AAAACATTTC 


TTGGTTGGGA 


GATAAAAACT 


GGGCATTGAT 


GGCGATTATG 


22980 


ATTATTCTCT 


TGACCACTTC 


AGTTGGTCAG 


CCCATCATCC 


TTTATATCGC 


TGCCATGGGG 


23040 


AATATTGACA 


ATTCACTGGT 


TGAAGCGGCG 


CGTGTTGATG 


GTGCAACTGA 


GTTTCAAGTT 


23100 


TTTTGGAAGA 


TTAAATGGCC 


AAGCCTTCTT 


CCAACAACTC 


TTTATATTGC 


AATCATCACA 


23160 


ACAATTAACT 


CATTCCAGTG 


TTTCGCCTTG 


ATTCAGCTTT 


TGACATCTGG 


TGGTCCAAAC 


23220 


TACTCAACAA 


GTACCTTGAT 


GTACTACCTT 


TACGAAAAAG 


CCTTCCAATT 


GACAGAATAC 


23280 


GGCTATGCCA 


ACACAATTGG 


TGTCTTCTTG 


GCAGTCATGA 


TTGCTATCGT 


AAGCTTTGTT 


23340 


CAATTTAAAG 


T AC T TGG AAA 


CGACGTAGAA 


TACTAAAGAA 


AGGAGACAGC 


TATGCAATCT 


23400 


ACAGAAAAAA 


AACCATTAAC 


AGCCTTTACT 


GTTATTTCAA 


CAATCATTTT 


GCTCTTGTTG 


23460 


ACTGTGCTGT 


TCATCTTTCC 


ATTCTACTGG 


ATTTTGACAG 


GGGCATTCAA 


AT C AC AAC CT 


23520 


GATACAATTG 


TTATTCCTCC 


TCAGTGGTTC 


CCTAAAATGC 


CAACCATGGA 


AAACTTCCAA 


23580 


CAACTCATGG 


TGCAGAACCC 


TGCCTTGCAA 


TGGATGTGGA 


ACTCAGTATT 


TATCTCATTG 


23640 


GTAACCATGT 


TCTTAGTTTG 


TGCAACCTCA 


TCTCTAGCAG 


GTTATGTATT 


GGCTAAAAAA 


23700 


CGTTTCTATG 


GTCAACGCAT 


TCTATTTGCT 


ATCTTTATCG 


CTGCTATGGC 


GCTTCCAAAA 


23760 


CAAGTTGTCC 


TTGTACCATT 


GGTACGTATC 


GTCAACTTCA 


TGGGAATCCA 


TG AT AC T C TC 


23820 


TGGGCAGTTA 


TCTTGCCTTT 


GATTGGATGG 


CCATTCGGTG 


TCTTCCTCAT 


GAAACAGTTC 


23880 


AGTGAAAATA 


TCCCTACAGA 


GTTGCTTGAA 


TCAGCTAAAA 


TCGACGGTTG 


TGGTGAGATT 


23940 


CGTACCTTCT 


GGAGTGTAGC 


CTTCCCGATT 


GTGAAACCAG 


GGTTTGCAGC 


CCTTGCAATC 


24000 


TTTACCTTCA 


TCAATACTTG 


GAATGACTAC 


TTCATGCAAT 


TGGTAATGTT 


GACTTCACGT 


24060 


AACAATTTGA 


CCATCTCACT 


TGGGGTTGCG 


ACCATGCAGG 


CTGAAATGGC 


AACCAACTAT 


24120 


GGTTTGATTA 


TGGCAGGAGC 


TGCCCTTGCT 


GCTGTTCCAA 


TCGTCACAGT 


CTTCCTAGTC 


24180 


TTCCAAAAAT 


CCTTCACACA 


GGGTATTACT 


ATGGGAGCGG 


T C AAAGG AT A 


ATACTCTGCG 


24240 


AAAATCTCTT 


CAAACTACGT 


CAGCTTCACC 


TTGCCATACT 


TAAGTATTGC 


CTGCGGTTAG 


24300 


CTTCCTAGTT 


TGTTCTTCAA 


TTTTCATTGA 


GTATAGGAAA 


AT C AATCT AT 


CAAGATACAG 


24360 


AAGTATATTT 


TATAGATTTA 


GAGAATATAG 


AGGTTATAAG 


TGT C T AC AAA 


ATGGAGGGTA 


24420 


TGCAGTTACT 


TTATGAAGTT 


TTGTCAGACA 


CTTATAAACT 


TAAGAATGGT 


TTTAGTTAAC 


24480 


TATCAGAAAC 


GAAGGAAAGA 


GTATGATTTT 


TGACGATTTG 


AAAAACATCA 


CCTTTTACAA 


24540 


AGGGATTCAT 


CCTAATTTAG 


ACAAGGCTAT 


CG ACT AT CT C 


TACCAACATC 


GTAAGGATTC 


24600 
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TTTCGAATTA GGAAAGTATG ATATTGATGG AG ATAAAGT C TTTCTAGTTG TTCAGGAAAA 24660 

TGTCCTCAAT CAAGCTGAAA ATGATCAATT TGAGTATCAT AAGAACTATG CAGATTTGCA 2 472 0 

TTTGCTGGTA GAAGGACATG AATATTCGAG CTACGGTTCA CGTATCAAAG ACGAGGCAGT 24 7 80 

AG C AT TCG AC GAAGCGAGTG ACATTGGCTT TGTTCATTGT CATGAACACT ACCCACTCTT 2484 0 

GTTGGGTTAT CACAATTTTG CGATTTTCTT CCCAGGTGAG CCACATCAGC CAAATGGTTA 2 4 900 

TGCAGGCATG GAAGAAAAGG TTCGAAAATA TCTCTTTAAA ATTTTGATTG ATTAAAAATA 24960 

GGATGAATTG TTTTTTTGTA AAGCTTTGAT AATACTCTAC CATGAAATTG ATCTTTGTGA 2 5020 

GGTAGAGAAA TGAGAATAAA ATATTTAAAA ATTGGTATCT TCTAAGTATG CTGCAAGAGC 250 80 

TAGTTTCTTA GATGGACAGG GGATTACAGT TGATGAGATG GCTTGGATAA TTAGGGGCAT 2 5140 

TGTGAATGCA TTGATTGGTA GATACATAAA ATTAGGTACT TATGCGGCTA AGTATGGTAT 2 5200 

TAGTATGGCA CGCTCGATCT TAAGTAGGGT AGCTGCAACT GCAGCAGCAA GAGTAGGATT 252 6 0 

ACTGACCAAG ATTTCTGGAT GGATTTTACG AGTAGCTGTG AATGTAGCTG ATGTATATGG 2 5320 

TAATTTTGCC AACAATATTG CTGCAGCTTG GGATGCATAT GATAAAATTC CTAACAATGG 2 53 80 

TCGTATAAAC TTTTAAAATG CGAGAATGAA AGCACTTTGT ATTTTTTTAT TGAATATGTT 2 544 0 

AGCTTGGACA GTGCTTGCAA TGATAATTCG TGGAGGGCTA GATGGATTTG AT AG GC AT AC 2 5500 

TTGGAGTACT ATTTTAATTG CGTCGCTGTT CGGGGTATAT GATTATAAGC C CAT AG AT AA 2 55 60 

AAATAGAAAA AAGTCCAAAA GAAAAAATAG ATTTGTTCAT GGTAGGGACT TATGAAAGCT 2 5620 

TTACTGACAA AAAAGAAAAC AGTTTACAAA G AAAAAT GAT GGAGGAGCAA ACATGGCACA 25680 

AAAAGGAGTA AGCCTTATCA AGGCAGCATT TGATACAGAT AACTTTCTCA TGCGTTTTAG 2 574 0 

TGAGAAGGTC TTGGACATCG TGACAGCCAA TCTTCTTTTT GTCGTCTCTT GTTTACCCAT 2 5800 

CGTGACGATT GGAGTGGCTA AAATCAGCCT CTACGAGACC ATGTTCGAAG TTAAGAAGAG 25860 

CAGACGGGTG CCTGTTTTTA AAATCTATCT AAGATCTTTC AAGCAAAATC TGAAACTAGG 2 5 920 

TCTTCAGCTG GGTTTAATGG AGTTAGGAAT TGTGTTTCTT ACCCTTTCAG ATCTCTATCT 2 5980 

TTTCTGGGGT CAAACAGCTC TGCCCTTCCA ATTGCTGAAA GCCATTTGTT TAGGTATTCT 2 6040 

GATTTTTCTT ACTATCGTGA TGCTGGCTAG TTACCCTATC GCGGCACGTT ATGACCTATC 2 6100 

TTGGAAAGAA ATTCTTCAAA AAGG AT TGAT GTTGGCTAGT TTTAACTTTC CTTGGTTCTT 2 6160 

CCTCATGTTA GCCATTCTTG TCCTCATTGT GATGGTTCTT TATCTGTCCG CCTTCAGTCT 2 622 0 

ACTCTTAGGT GGCTCAGTCT TCCTACTTTT TGGGTTTGGA CTATTGGTCT TTATCCAGAC 2 6280 

TGGATTGATG G AG AAAATT T TCGCAAAATA CCAATAGGAG CTTTATTTCT GAAACTACTT 2 6340 

TCAAAGGCTC CAAACGCTAT TCTATAAGCG AGAAACTAAA ATCGG 2 6385 
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(2) INFORMATION FOR SEQ ID NO : 4: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2716 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 

CCTGCCCGCA TTGCCCTAGG CAT T AAGT AA ACATATAAAA GCATGTGAGA GACTGTTGGA 6 0 

AAAGCGAGGA AATTTCCCCT CTTTTCCTCT AGTCTCTCCT TTCTTTTGCT GATTTTATTC 12 0 

AAAGAAAATG ATATAATAGT AGTTATGGAG AAAAAGAAAT TACGCATCAA TATGTTGAGT 180 

TCAAGTGAGA AAGTAGCAGG ACAGGGAGTT TCAGGTGCTT ACCGTGAATT AGTTCGTCTT 24 0 

CTTCACCGTG CTGCCAAGGA CCAATTGATT GTTACAGAAA ATCTTCCAAT C G AGGC AG AT 3 00 

GTGACTCACT TTCATACGAT TGATTTTCCC TATTATTTAT CAACCTTCCA AAAGAAACGC 3 60 

TCAGGGAGAA AG ATTGGC T A TGTGCATTTC TTGCCAGCTA CACTTGAGGG AAGT TTG AAA 42 0 

ATTCCATTTT TCTTAAAGGG AATTGTGAAA CGCTATGTAT TTTCTTTTTA CAACCGGATG 4 80 

GAGCACTTGG TTGTGGTCAA TCCTATGTTT ATTGAGGATT TGGTAGCAGC TGGTATTCCA 54 0 

CGTGAAAAAG TGACCTATAT TCCTAACTTT GTCAACAAGG AAAAATGGCA TCCTCTACCA 600 

CAAGAAGAGG TAGTCAGACT G CG C AC AG AT CTTGGTCTTA GTGACAATCA GTTTATCGTA 6 60 

GTAGGTGCTG GGCAAGTTCA GAAACGTAAA GGGATTGATG ACTTTATCCG TCTGGCTGAG 72 0 

GAATTGCCTC AG ATT ACC TT TATCTGGGCT GGTGGCTTCT CTTTTGGTGG TATGACAGAT 780 

GGTTATGAAC ACTATAAGAA AATTATGGAA AATCCCCCTA AAAATTTGAT TTTTCCAGGC 840 

ATTGTATCGC C AG AG CGGAT GCGCGAATTG TATGCTCTAG CGGATCTTTT CTTGTTGCCT 9 00 

AGTTACAATG AGCTCTTTCC TATGACTATT TTAGAAGCTG CGAGTTGTGA GGCTCCTATT 960 

ATGTTGCGTG ATTTAGATCT CTATAAGGTG ATTTTGGAGG GAAATTATCG GGCGACAGCG 102 0 

GGTAGAGAAG AGATGAAAGA GGCTATTTTG GAATATCAAG CAAATCCTGC TGTCTTAAAA 1080 

GATCTCAAAG AAAAGGCTAA GAATATTTCC AGAGAGTATT CTGAAGAGCA TCTGTTACAA 114 0 

ATCTGGTTGG ACTTTTATGA GAAACAAGCC GCTTTAGGGA GAAAGTAAAA AGTGAGGTAA 12 00 

TCTATGCGAA TTGGTTTATT T AC AGAT AC C TATTTTCCTC AGGTTTCTGG TGTTGCGACC 12 60 

AGTATTCGAA CCTTGAAAAC AGAACTTGAA AAGCAGGGAC ATGCTGTTTT TATCTTTACG 13 2 0 

ACGACAGATA AGGATGTCAA TCGCTACGAA GATTGG C AAA TTATCCGCAT TCCAAGTGTT 13 8 0 
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CCTTTCTTTG CTTTTAAGGA TCGTCGCTTT GCCTACCGAG GTTTTAGCAA GGCACTTGAA 144 0 

ATTGCTAAAC AGTATCAGCT AGATATTATC CATACTCAGA CAGAATTTTC TCTTGGCCTG 1500 

TTGGGGATTT GGATTGCGCG TGAATTGAAA ATTCCAGTCA TCCATACCTA TCACACCCAG 15 60 

TATGAAGACT ATGTCCATTA TATTGCTAAG GGGATGTTGA TCCGGCCGAG TATGGTCAAG 162 0 

TATCTGGTTA GAGGTTTCCT GCATGATGTG GATGGGGTTA TTTGCCCTAG TGAGATTGTC 16 8 0 

CGTGACTTGC TATCTGATTA TAAGGTCAAG GTTGAAAAAC GGGTCATTCC TACTGGGATT 1740 

GAATTAGCCA AGTTTGAGCG TCCGGAAATC AAGCAGGAAA ATTTGAAAGA ACTGCGTAGT 1800 

AAACTAGGGA TTCAAGATGG TGAAAAGACG TTGCTTAGTC TTTCGAGAAT CTCCTATGAA 1860 

AAAAAT AT TC AAGCAGTTTT AGCAGCCTTT GCTGATGTTC TGAAAGAGGA AG AC AAGG TT 192 0 

AAACTGGTAG TAGCTGGGGA TGGCCCTTAT CTGAATGACC TCAAAGAGCA AGCCCAGAAC 1980 

CTAGAGATTC AAG ACT C AGT CATCTTTACA GGGATGATTG CTCCTAGTGA GACGGCTCTT 2 04 0 

TACTATAAAG CGGCGGATTT CTTCATTTCG GCATCGACAA GCGAAACGCA AGGTTTGACC 2100 

TACTTGGAAA GCTTAGCCAG TGGAACACCT GTCATTGCTC ACGGAAATCC TTATTTGAAC 2160 

AACCTCATCA GTGATAAAAT GTTTGGAACC TTGTACTATG GAGAACATGA TTTGGCTGGT 222 0 

GCTATTTTGG AAGCCCTGAT TGCAACACCA GACATGAACG AGCATACCTT AT C AG AG AAA 2280 

TTGTATGAGA TTTCAGCTGA GAACTTTGGG AAACGAGTGC ATGAGTTTTA TCTGGATGCC 2 340 

ATTATTTCAA ATAACTTCCA GAAAGATTTG GCTAAAGATG ATACGGTCAG TCAGCGTATC 24 00 

TTTAAGACAG TTTTGTATCT TCAGCAACAG GTGGTTGCTG TACCTGTAAA AGGATCTAGA 24 60 

CGCATGTTGA AGGCTTCAAA AACACAGTTG ATCAGTATGA GAGACTATTG GAAAGACCAT 2 52 0 

GAAGAATAGA AAGAGGAACA GCTATGAAAA AAACAATTAA TGAGAAGCGG TCGTGATAAA 2 5 80 

AAGATTGCGG GTGTTTGTGC TGGGGTGGCC CATTATCTGG ATATGGATCC G ACT AT CGT T 2 640 

CAAGTCATTT GGGGTGTTCT TACTTGCTGT TACGGAGCTG GAATTGTAGC T T AC AT TAT T 2 7 00 

TTATGGATTA TCGCGA 2716 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 92 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5: 
CTTTGGTTTT GCCTTATTCA AGACATGAGG GCCATCAGGA ATGATCTGAA ACTGCGAATC 60 
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TGTTAACAGT CTATGGAGAG CTTTCATAGA ACTAAGATTC GGTTTATCTT TGCTGCCACA 12 0 

AATTAGTAAG GTTGGATAAG GGTAAGTTCC TGCTATATCC GTTAAATCAA GTGTCTTCAA 180 

CTCCTCAGAA ACTCCGACCA TAAGAGTCTT GTCTGCTCCC TGTTTTTCAA ATACTCTTTT 240 

GGGAAGTAGT TTAAAAATCA GCAATTGAAG ATAAAATAGG ATATTCCCTG CTAATTTAAG 3 00 

CGGGCATCCT G AC AGAAT C A AAGCTCGAAG ATTTGGTAAA TCGTAACTGG AAAGTTCTAG 3 60 

TGTCAGGGCA GCACCTAAGG ACAATCCAAT CAAAACAAAA GGTTCTGTCT CTTGAGCTAG 42 0 

GTGCTGATAA ACTCGCTCTT TAGCTTGTTG AT AGT T AC T A ACTCCAGAAG GAAATAACTC 480 

GATAGCCTCA GAAGGATAAT CTGTCAGTAG ATTCCGAACT TCTTTCCAAG ACTCTGCTGA 54 0 

CTGCCCTAAC CCATGCAAAA ATATTAATTT CATCTAGTTC TCCTCAAGGC TTAATTCATA 600 

CAAGCCTCTC ACTGCATTAC AGCCGTAAAT AGCTTCTGCT TGGGTTAAAT CTGCCAAGGT 660 

CAAGACTTTC TCTTCTACCT GTCCTGTTTC TAGCAAATGC TGACGGTAAA TTCCTGGCAA 72 0 

GATTCCAAGT CGGATAGGCG GTGTGTAGAG TTTTCCAGCG ATTTTCAGAA CCAAATTTCC 7 80 

TATAGAGGTT TCAAGCAGTT CTCCTGACTT ATTGTGGTAA ATCTTCTCTT GTTCTCCTAG 84 0 

GCTCAAATGC GGTCGGTGAG TGGTTTTAAA GTAGGTAAAG GATTGATTCA AAGCAGCTTC 900 

CTGAAGACAG ACTTGGGCCT GACAAAAGCT TGTACTGAGA GGGGTTAATA CTTGACGATT 960 

GACTTCTATC TC T C C AG AT T TGCTAAGGCT GATTCGCAAG CGGTAATCTC GAT TAG C T TC 102 0 

ACAATCCTGA CACTCTTCCT CAATCTTGTG TCCCAAGTCT TCTGCATCAA AAGGAAAAGC 1080 

AAAATAACGA CTAGCTTTTC TCAGCCTTTC CAGATGTTGT TCTTCAAACA TCAGTTGTTT 114 0 

TTGGCTGATT TTTCCAGTTG TAATTAATTG GAAGCGAGCT TGTTTACGAT AGAGAACTGC 1200 

TGCCTTTTGA TGAACCTCTC GGTATTCAGA TTCCCATGTG CTATCCCAAG TAATCCCTCC 12 60 

GCCAACTCCA TAAATGGCTT GACCTTTGTG AAGTTGAATG GTACGAATGG CCACATTAAA 13 2 0 

AATCCGTCGT CCATTTGGAA GCAAGAGACC AATCGTTCCA C AGT AG ACT C CACGCGGTTG 13 8 0 

AGGCTCCAAG TCCTTGATAA TCTCCATTGT CGCAATTTTC GGTGCACCCG TTATGGAACC 1440 

ACAAGGAAAG AGTGAGCGGA AGATTTCAAC AAGGTCCACA TCCTCTCGCA ACTGACTCTT 15 00 

GATGGTCGAA GTCATCTGCC AAACAGTTGA ATACTGCTCT ACCTGACACA GACGCTCCAC 15 60 

GTGCTCGCTC CCAACTTCAG AAATACGGTT CATATCATTG CGCAAGAGGT C C AC AATC AT 162 0 

CATATTTTCA GAGCGATTTT TGGGATCCTG TTCCAACCAA CTGGCCTGTT CAAGATCTTC 1680 

TTGGTCAGTT ACCCCACGCT GAGTCGTCCC CTTCATTGGT CGTGTTGTCA ACTCGCGATC 1740 

ATTTTGCTCA AAAAAGAGCT CTGGGCTCAT GGAAATCACT GTCATCTCGT CATGTTCCAC 1800 
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ATAGGCATTG TAGCCCGCCT CCTGCTCTAC C AC C AT ACG A TTGTAGATGG CAAAAGGATT 18 60 

GGCATTTAAC TTTTGCTTAA GTTGGACGGT GTAGTTGACC TGATAGGTAT CTCCCTGCCG 192 0 

TAAATGATGG TGAATTTGGG CAATGGCCTT TTCATAGTCT GCTGCAGACG TTACTTCCTG 1980 

CCAATTTGAG GGCAAATCAA TATCCTCATA AGTCAGAGGA ATAGGGGAAG TTTCTACGAT 2 040 

ATCATGAACA GTAAAGTAAA GCAGGTACTC TCCCAGTAGG GGATCCTTGT GAACTGCTAA 2100 

TTTTTCCTCA AAAGCAGGTG CAGCCTCGTA GCTGACATAC CCCACCACAT AATAACCTTG 2160 

CTCTTGGTAG CTTTCCACTT GTGCCAGCAA ATCTGCCACT TCTTCTACAT TTCTCGTTTT 222 0 

CAACTCTTTA ATAGGCTGGG TAAAGGTATA TCTCTCCCCC AAAGTCCTAA AATCAATCAC 2 2 80 

TGTTTTTCTA TGCATACCTT AAGTATAGCA TAAAATAAGA AAACCCTCAT CCGCAAAGCA 2 340 

GATGAGAGAT TTCAATTATT TAAAGATTGA AGTTTTAAAG CTATTTGTTT GTTGAAGAAG 2400 

TTTCTTATAA ACAGCTTCTT TTAATTTAAC TGTATTATTC ATAGATACTG TTTTATTACC 2460 

GTTTGCTTCT TGTTTAAGAG TTTCGGCATC TTTTTTAACA GCTTCTTTAA ACAATGTCAG 2 52 0 

TAAATCATCG TATGATGAAA CGGAAGAACC ATTTACTTCG AATGTTGTTA ATCCTTTCGT 2580 

TGCTTTATCT TTAACTTCTT TGAAGTAAGC TTTTTTAAAT TCTTCAATAG TATTAAATGT 2640 

ATTGTTAGAT ATTTTCTTGA TAATATATTC ATCACTTAGA ACAGACTCAC CATCTGTTTT 27 00 

AGATTGTTGT TTATATTTAT T T G AAG CAT A ACCTAAGAAC CCATTTTCGT ATCCGTAGTA 2 7 60 

ACCCCATAAT CTAAAAGCAT TATGTTTGAA TGAAACAGCT CCAGGAGCAC CTTTACTAGT 2 82 0 

ATTACCTCCG T AG AT AC CGG TCATCATTCT AACACCTACA TAAGGTGATT GATCGTTATA 2 880 

GCTAATTGCT TCGGGTTTAT AGATACCATT ACCTGGATTG CG ATT AG T C A TTAATTGTTG 2 940 

AT C AAC T AAA TCATTAACAG ATTGAATATT TAATTCATTT TTCTCTTCTT GACTTAGATT 3 000 

TCGAATTTTA TCCCATTGAT TTAATTTATT GTTATCACGG TATTCTCTAT CTATTTTTTT 3060 

GAACCATGCA CTATTTAAAT CTTTATTTTG TTGAGAAATC AC AG AT TC AG CCTCAATTTC 312 0 

AT C AAG AAG A GTTAAAGTGT CATTATAACC CTTCATATAT CTATTAATAT CTTCTCGTGT 3180 

TTTTAGAGTT TTTGGATCTG T AAT AT AC C A CTGATTCCCA TCATTTTTGC GTTTAAATAC 3240 

CATATTAATA CCTAAAGAAC CAAACTCATC AAATCCACTA CCAGTAACAG GAGTTTGTAG 3 3 00 

CATACCCTGA GCATATGCTT CAGCATCAGT ACCTTCACGG TGTCCAAAGC CACCTAAGTA 33 60 

AATCGCACGG TCGTTGACGT GTGTTGTTTC ATGTGTGTAA ACTGAAATAC CGTATTCACC 3420 

AACCATTTCT AAATGAACAT ATTTTACATC AGTTCTAATA TCATCAGAGT TAGGATATAT 3480 

AGCAGCATAA GCTCCTGTTC CATTATAATT ATAATACTTA TC C AT AGG AC CAAAGAATTC 3540 

TCTAAGAGGA GTATATACTT TGTCGGTATT ATAGCGGCCA TATTTTTCAA CCCATCCACC 3 600 
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AGGAGCGTTA TAACCTTCCC AAATAGGAAT AACAGCATCT CTTAGTAGTC GTTGTTTAAC 3 66 0 

GT T AT C AG AC GCTAGACGAT ACCAGAAATC ATAATAGTTT CTATAACCAT CTGCAGCTTT 3720 

GTTAACGATA TCTTTAATAT CTTCTAATGA TTTTTTACCT AATCGCTCTG CACTACCAAA 3 7 80 

GGCAATTGCA TTATAATTTG AAATTAAATA AAGATGTGCT TTATCAATAT TCAGTAGTGG 3 84 0 

GAGTATAGTA TTTCTAAGGT GACTTCGTTT TAAATTATCG AATGCACGAT GTTTAGAATT 3 900 

TTTAATTTCT TCGACCTCAG AAGCGCGTTC TGCGATGTAG ACATGGTCTT C T GT AG C AT C 3 9 60 

AATAAACCAA TCGTTCATAT TGTCTATATT TGTGAACAAT TGTCTATTAT AATTTAAAAA 4020 

TGCATCTAAA TTACCTGATT TAGTATATTT AGCCAATACT TGACCGAATG CGTCGAATGT 4080 

ACGTGAACCT TTAATGTTGT TCTCTTTAGA ACCGATTTCA ATTAATCTGT CTAATACGCT 4140 

AACTTTTTCA C C AT AG AAAT CTGGTTTGAA TAGCATTAAT TCTTTAATAT TAACATCACC 42 00 

AAATTTAACT CCATAGTAAC G AT TT AG GT A AGTTAAACCT AGTAATAAAG CTGCTTTGTT 42 60 

TTTCTCGACT TTATCACGAA TCATTTGACG AGCAGCTGGA G AAT CAT T T A GTTGATGTTC 4 320 

TTCGTTTTGA ACTAATTTTG TGATTAGGTT TGTTAAGTTT TCTTTAACAT CTGTGAAGCT 4 3 80 

TTCTTCTAAA TAT AAAT C T T TGATTGCATT AACTCTATAG TCACCTAATC GATTTAGATG 444 0 

CTGATACATC GTTTGAGACT GAAGCTCTAC TGATTCTAAA ATAGATTTTA TATCATTAAC 4500 

AAGAGTAGTG TTATCTTTTT G AACG AT AT T AGGTGTATAT TTAATTCCTA AGTCAGTTAT 4 5 60 

AGTATATTCT T T T AC AT T AC TTAAACCTTC ACTGCTAGAA GACAAGTTAA AGTAATCTTT 4 62 0 

TGTACCGTCC GCATAGTGAA CAATAATTTT AT TAG C TT C A TCTAGGTTTG TGATAAACTC 4 680 

ATTGTTGTTC ATCGCGGTAA CAGAAAGAAC TTCTTTAGTA TTTAGATGGT GTTCTTTATT 474 0 

TAATTTATTA CCTTGATATA C AAT AT AAT C TTTATTGTAG AATGGTATTA ATTTTTCAAG 4800 

ATTTTTATAG GCTTGGTTAT ATTCAGCGTT AT AAT C T TG A AT AC TAG AAT AGGCTTTTTC 4860 

TTCATTAAGT TTTGCAAGAG GAG AT AG AT C ACTTTCTAAT TTATCAGCAG TAATATTGAA 4 920 

AGTAGTAACT TTAGCATCAG CTTGTTCTTT AGTTAATTTA GTAAATGTTT TAGATTTCCT 4980 

AAATG AT CT A TTACCTGACG AATATCCCTC TACCGCATAT AAATCTTTTA TATGAGCACT 5 04 0 

AG CAT AAT C A G AAT CAT C AA CGTCGTTAGA GCCGAATAAC TCCTCTCCAC GGATAATCTT 5100 

AGCATAGCTG ACAGAATTAC TTACCGTACC TACAGGCCAA GTCTTACTTG CTATTGCTCC 5160 

AACTTCTACT GGATTTGAAA CATCTATTTT ACCTTTTACA ACCGACTCAG TTAGGAGAGC 52 2 0 

TTTTGTACCA ATAAGATGGT CTAGAGTTAA TCCATAATCT ACTTTAGGAA CTAACAAGCT 52 80 

GGCGCGTGTT TTGTTTCCTG TAATAGTAGC AT C AAC AT AT GCTTTTCTAA CAATTCCTCT 53 4 0 
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ATAGTTTGTA CCTGCAATTC CCCCTGTATG AG AG C CAT T T CCACTTGTAG AGTGTAGTTT 5400 

GCCAAAGAAA GCAACATTTT CAATACGAGT TCCATCATTC ATATTATTTA CAAATCCAGC 5460 

AACATTATTA CGACCTGAAA GTGTGCCTGT AATTTTGACA TTTGTAATAA CTGAAGAACC 5520 

TTTCATAGTA TTGGCTAATG ATGCAATATT ATCTTGACCA GAACGTTCTA TCTCTACATT 5580 

TTCAAAATTC ACATTATTTA TCGTTGCGTT TGTTATCACA TTAAATAATG GATGTTCCAA 5640 

TTCAGTAATA GCAAATTGTT TTCCTTCAGA ACTTAAAAGT TTTCCTGTGA ATT CT T TAG T 5700 

GATATATGAT TTTCCATTAG GAACAACATT TCTAGCGCTC ATTGATTGTC CCAGACGATA 57 60 

TTCTTTTGAA GGATCGTTTT GAATAGCTTC CACTAATTCT TTGAAATTAT AATATACATT 5820 

ATCTTCGTGG ACTTTAGGTT TTTCAATATA GTGAACGTAT TCTTCTTCAA ATTTATTATC 58 8 0 

AGCAGTTCTA GAGACTAAAT TGTCTGCGAT TGCTGTAACT TTATATACAG GTGTTCCGTT 5940 

AACCGTAGTT TCTTCTATAT TTTTAACAGC TAGTAATGTA GTTTTCTGAT TATTTGAAGT 6000 

TATTTTTAAA TAATAATTGC TCTTATCATC AGGAATAGTT GTTATCAGTG ATTCATTAGT 60 60 

TTCTTTTCCA TTTTCGTATT TGATTAAATC TGTACGTTTA ATATTTTTAA GCTCAACTTT 612 0 

TTTAAGATCT AATTGAATAT TTTGATTTTC TAGAGTTTCA GTTTCTTCAC CGTTACCTCT 6180 

GTCGTAAATC AT AGT TGT AG ATAGGGTGTA TTCTTTGTAG TACTCTAGGT TCTTAAATGC 624 0 

AGCGCTTATA GTTTCTGTTG TTACCTTGTC ATCTGTAAGG ACTACAGTAT TAATAACTTC 6300 

TTCTCCTTTT TTCAATTCAG CTGTGATTGA TTTGATTTTT GTTTTGTTTT GATTTTCTAG 63 60 

AGTATACTTA GCAACAGCTT CACGTTCCAA TATTTTCTTA TCGGTACTAG TCAATGTTAA 642 0 

TATTGGCTTT T C AG AT AATT CAACCAATTT TTCAATAGTT GCAGTTAATT TTTCAACAGC 6480 

TTCGTTAACT TCACTTTGTT TAGCATCTGT ATTAGCTGCA ACTTTTTCAG CCTTTGTAAC 6540 

TTCAGTTTGG AGGTTTTGCC AACTTCTATC ACTGTAATGT TCTTTTACCT TTGTTTTTGC 6600 

AT CTGC AATC GTATTGTTTA ATTCAGTTTT ATCAACGTTT AGAGCGTCAA TAGCCGTTTT 6 660 

AAGTTTATTT GTCTCGCTAT TTACCTCAGG CTGTTTTACA GGCTCTGAAG C ATAGAC AC C 672 0 

TTTTGCAGTT TCTAAAACAG GTCCAAGAGC ATTGTAACTT GCTGTAGAAT AATC AGT AGG 67 8 0 

AGAAACTGAA CTAGCTTTAT CAATTTGATT ATTTAACTCA CTTTTATCAA CTGGTTCTTT 6840 

AGTACCAATA CCCTTTATTT TATCTTCTGG TTTCGGTGTT TCCTCTACAG CCTTCTCTTC 6900 

TTCAGGAACT TCTGGTTGCT TTTCTGGCTC AACTGGTGCC GTTGGTGCCT GTTCGTCTTC 6960 

TCTTGGCGCG AC TGGTT C AC CTGCTTGTTC AACTTTTGGT TCCTCTGTTG GTTCTGTTTG 702 0 

TTTTTCTACA GCAGGCGTTT CAACTTTTGG TTGTTCAATA GAT TG AT T AA CAGTCTCCTC 7 080 

TTTTGGTTCT ACAGTTTCTT CAGCCTTGGT ATCTGGAGTT GACTCTTCTT GTTTCGGTGT 714 0 
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TTCCTCTACA GCCTTCTCTT CTTCAGGAGC TTCTGGTTGC TTTTCTGGCT CGACTGGTGC 72 00 

CTTTTCGTCT TCTCTTGGCG CGACTGGTTC ACCTGCTTGT TCAACTTTTG ATTCCTCAGC 72 60 

TGGTTTGTCT GATGGTTGAC TTTCTGGCTT AACTGCTACT TTTTCCTCTG GTTTTGACTC 73 2 0 

AACTTCTCCA CCTACTTCTT CAACTGGAGC TGGTTCTGCT GAATCTTCTT TCCCCTCTTC 73 80 

TACTTTAGGA AGGGTGTCGT CAGTAGGTTT TACCTCCGAT TTTGGTTCTT CCTTTGGACT 7440 

TTCTTCTGTT TTAGGTGCTT CTTCTTTTGG AGCTTCCTCT GTCTCTACTA CTTGGTTTTC 7500 

TGTCCTAGCT TGCTCCTGAT TTGTTATTGA TTGAGGAGTC TCAACTTCGA CCACAGTCAC 75 60 

CTCTCCAGGT TTTGCTGAGG TTTCTTCTAA AACAGTGTCC AAGCCAAGCG TTTTGAGGAT 7 62 0 

GTCACCTGAT AGATAACCAA CATAGCGATA GCCCTCCATT TCAACAACAC CCTCTCGACT 7 680 

AGCCAGCGCT AGGGTCGCAA CTGGGTCTAC AGCCCCTGCA CTAGGAAGAA CTACCAATCC 774 0 

CATAGCTCCA ACTAGAAAGA CGCTAGCAAT TTTCTTTCTC TTGTAGATTA AAAGCAAGCT 7 800 

CCCAACAGTC AGCAAACCAA AAGCTGTCAA AACAGATGCT TCTGTCCCTG TTTGAGGCAA 7 8 60 

CTGATCTTTT TGATACACCA AACCATATAC AACTTCATTC CTGTCAGGCT TTCCTGTCTG 7 92 0 

AATTAAATCT TTAGCTTCTT GTGAAATAAT CTCTTTATTT ACATAGTGAT AGGTGGCTGC 7 9 80 

GT C C AC T AC A GAAGGAGCCA TCAAAAGGCT TCCAAGAAAT ACAGAGCCTA CAACTCCCTT 8 04 0 

AATCTTACGA ATTGAAAAAC GGTCTTTTTT AAACACTTTT ATCTCCTTTA TTCATTCTCA 8100 

AAACTTCCTA AT AG CAT C T T GCGGATAGTG CGCACGCGCA CCTCCGATTA ATTTTGGACG 8160 

AC TAG C C AGT GCCGTTACAT GGGCATGACC AATCTCTCTC AAAATAGGGC GAATCGGAAC 8220 

CTGAACATGC TTGACATGCA TGCCAATTGC AGTGTCTCCG ATATCCAATC CAGCATGAGC 8280 

CTTGATAAAT TCAACCTCAA CTGGATCCTG CATAAACTTA AAGGCTGCCA ACTGCCCCGA 8340 

ACCTCCTGCA TGAAGAGTAG G ATGG AC AC T GACAATTTCC AGACCAAACT GCTCTGCCAC 84 00 

CTGACGTTCA ACAACGAGAG CCCGATTGAC ATGCTCACAA CCTTGAACTG CTAAATGGAT 84 60 

ACCTCTACTA CCTAGAATAT CCAAGATAGT CTCCACTATC AGCTCACCAA TCTCTTGACT 8520 

GGATTCTTTC CCAATATGAC CACCTAGCAC CTCACTAGAA GATAGACCTA AAACAAAAAG 8580 

GGCCCCCTGC TTCAAATTGG TCTTTTCTAA AACATCTTCC ACTACCTGAC GTGTTTCTCT 864 0 

TTGAATCTGT GTCTCGTTCA TCTCTGTTAC CTCTGTTGTC ACTCTTCTAT CATACCGTTT 87 00 

TTTCTTGTTT TTAGCAAGAT AGACAACCTA GAAAGTTTGC CCAATTACGC ATAAAACTCC 87 60 

CAGAATTGAC TGGGAGTTAG CTAGTTTCTA TTCTATTTAT ATATATTTCA ACTTTCGTCC 882 0 

CTTTTTGGGG TCTAGAATCA AT C T T CAT AT GGTAATTGGC TCCAAAATGA AGTTTGAGCC 8880 
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GTTGATCGAC 


ATTTTGAAGA 


CCAACTCCCC 


CACGTTTGAG 


TTGACTTTGA 


CTACTATCAC 


8940 


CAGCATCTTG 


GAAGCCAACG 


CCATCATCCT 


CAATACGGAT 


GACCAATCCC 


GAATCCTGTT 


9000 


TCTGGACAGA 


AAGTTTAATA 


TGGCCCTGAC 


CTTCCTTTTC 


CTTAATGCCA 


TGGTAAAGAG 


9060 


CATTTTCTAC 


AAGGGGTTGT 


AGGACCAGCT 


TGGGTAAGAC 


TAAATTATCA 


AAGGCAACAT 


9120 


TTTCATTAAT 


TTCGTATTCC 


AGCTTATCTC 


CATAGCGTTG 


TTTCTGGATA 


AAGAGATACT 


9180 


GGCGGACATG 


ATTGATTTCG 


T C AG AG AG AC 


AAATCAAGTC 


CTTGCCTTGA 


TTGAGCGCCA 


9240 


AGCGGAAATA 


GGTTGCCAAG 


G AC TTGGT C A 


CCTGCACCAC 


TCGCTGACTA 


TCATGAAATT 


9300 


CAGCCATCCA 


GATGATGGTG 


TCCAAAGTGT 


TATAGAGGAA 


ATGTGGATTA 


ATCTGGCTCG 


9360 


AAAGGGCTTG 


AAGT TGGT AC 


TGACGGGTCG 


TTTCTTCCTG 


GCTACGAATA 


GCTACCATCA 


9420 


ACTGATCAAT 


CTGATCCAAC 


ATAGCATTAA 


ATTGGCGAGT 


TACTTCTCTC 


AGTTCATAGG 


9480 


CACCAACTTC 


CTTGGCACGA 


AGATTTTGAG 


CACCAGAAGC 


AATTTCCAAC 


ATGGTTTCTC 


9540 


TCAAATCCTT 


C AAAGG AG C A 


ATCCAGCGTT 


TAAGACTGAA 


C C AC ACT AAG 


CAGAGACAGA 


9600 


CAAGAAGAGA 


TGTGACACTG 


GCCCCAAGCA 


AGGTCCACAA 


GAGCTGACTC 


CGAACCTGGT 


9660 


CTAACTTTTC 


CAATGATGAC 


ACGCCAAGCA 


CCGTCCAATC 


AGTTCCTGCA 


ATCTTCTCTT 


9720 


GACTGACGTA 


GGATTTGTGA 


CCAGGAGTAT 


AACCCTGACC 


TGTATCGATG 


TAGGGTTTCA 


9780 


TAGCCTCCAT 


TTTGCTAGAC 


G AAC T AT AAA 


CTGTGTGTTG 


AGGATGGTAG 


ACAAATTCAT 


9840 


GGTTTTCATT 


GATAATGAAG 


GCAAAGCCCT 


GCTGCCCCAA 


CTGGAGTTGA 


TTGAGATAGG 


9900 


CTTCCAGAGT 


TTCATAAGAA 


ATATCCAAAC 


GAAGCACACC 


AAGATTGGCT 


CCCTTTGCAT 


9960 


CAACAAGTTC 


TTGAGTGACA 


GAAATGACCC 


ACTGACTATC 


TGATTTACGA 


GCTGGAGTCA 


10020 


AAACAGGCAT 


AGCTCCCTGA 


TGAATGGCCT 


TTTGGTACCA 


ATCCTCAGCC 


AT C AT ATC AG 


10080 


AGGAAGTTTT 


CATCTGCACA 


CTGTCATCTG 


TAGAAATGAC 


CTGACCAGAT 


TTGGTCACCA 


10140 


GCACAACAGT 


TTTCAAGTCC 


TTATCTGACT 


TCAAGATGGT 


CAAAAACAAA 


TCTCGGATTC 


10200 


CCTCGACCTT 


GTCTTGACTG 


GGATTCTCAG 


C AT AGG C C AG 




J- vjv_ ± j. ^_/-v 


1UZDU 


AACCAGTCGA 


GGTGGTTTCT 


AGTTTTTTGA 


TATAAGACTG 


AATAAAGTGG 


CTAGTCTGGC 


10320 


TGATGGTCGT 


TTGGCTGTTG 


CCCTCAATGG 


TGGCCTCAAT 


GGCTGAAGAA 


CTTGATTGAT 


10380 


AGTAGAAAGT 


TCCAACCAGA 


GCTAGGAGAA 


TGAGAAAGAC 


CAGAAAGATG 


GAAAT AAC C A 


10440 


TTCTAACTAA 


AAGAGAAGAA 


CGCTTCATCG 


GTCTTCTCCC 


TTCTTAAACT 


GACGAGGTGT 


10500 


CACACCTGCA 


ATCTGCTTAA 


AACGTTGGGT 


AAAATAGTTC 


ATATCTTCAA 


AACCAACCTT 


10560 


CTCTGCGATC 


TCATAAATCT 


TCAGATCTGT 


AGTTAAAAGC 


AAGAGCTTGG 


CTTGTTTAAC 


10620 


ACGTTCTCTC 


ACCAGATAAT 


CCTGAAAAGG 


CAAGCCCAAC 


TCTTTCTTAA 


TCAAGGAACT 


10680 
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CAGATAGGTC 


GGACTAAAAC 


CTAAGTCACT 


GGCTAAAGAC 


TTTAAACTAA 


ATTGGCTATC 


10740 


AGCCAGATGA 


GACTGGATTT 


TCTGGGCCAT 


GTTTCCTTCA 


AACCTATTAG 


TCAATAAATC 


10800 


TTGTAACTGC 


TCTTCTTTCT 


CTTCCTTGTC 


TAGTTTTTGT 


TTGATTTTCC 


CCAACATTTC 


10860 


CTCAATATCC 


TGACGAGAAA 


AGGGTTTGAG 


CAGGTAGTCG 


TCCACACCTA 


GTTTGACAGC 


10920 


AGACAAGGCA 


TAATCAAAAT 


CAT CGT AAC C 


TGTTAAAAAG 


ACCAAATGAA 


CCTGAGGATA 


10980 


GGTTTCTCGT 


ACCAGACTGG 


CCAACTGGAT 


GCCATTTAGA 


TGAGGCATGT 


TGATATCGGT 


11040 


TAAAATGATA 


TCTGGCACCT 


GCTTTTGGAT 


CAATTCCCAA 


GCCTGCCTTC 


C ATT T T C AG C 


11100 


CTGACCGATG 


ATTTCCATAT 


CGTAGGCTGC 


TACATTGACC 


AGTTTAGTCA 


AACCTTGTCT 


11160 


TACCAGATAT 


TCATCTTCTA 


CGATTAAGAT 


TGTGTAGGTC 


ATGCTCTGCT 


CCTTTACCAC 


11220 


TTACTAGTAT 


CAGTATAGCA 


AAATTCTCCT 


CTAACTGCTT 


AGGAAAGACC 


TCTTATACTC 


11280 


AATAAAAATC 


AAAAAGTAAA 


CTAGGAAGAT 


AGCCACAGGT 


TTCTCAAAGT 


ACCGCTTTGA 


11340 


GGTTGTAAAT 


AAAACTGACG 


AAGTCGACTC 


AAAGT AT AG C 


TTTGAGGTTG 


TAGATAAAAC 


11400 


TGACGAAGTC 


GATAACCCTA 


CATACGGTAA 


GGCGACGCTG 


ACGTGGTTTG 


AAGAGATTTT 


11460 


CGAAGAGTAT 


TAATCAACAT 


AATCTAGTAA 


ATAAGCGTAC 


CTTTTTCTTC 


CATTTGGTCT 


11520 


TTGGGAATAA 


AG CGG AT AG A 


GAGGCTATTG 


ATACAGTAAC 


GTAAGCCGCC 


CTTGTCCTGT 


11580 


GGACCATCCG 


TAAAGACATG 


CCCAAGGTGA 


GAATCTCCTA 


CTCGGCTCCG 


CACTTCCATA 


11640 


CGCGTCATAT 


TGTAGGACTT 


ATCTTCCTTG 


TAGGTGACAA 


CATCTGGACT 


GATGGGTTGG 


11700 


GTAAAACTAG 


GCCAGCCACA 


ACCAGACTCA 


AATTTGTCTT 


TTGATGAAAA 


GAGAGGTTCC 


11760 


CCAGTTGCTA 


TATCCACATA 


GATACCGGAT 


TCAAATTTAT 


CCCAGTAACG 


GTTTGAGAAA 


11820 


GCTCGTTCTG 


TTTGATTTTC 


CTGGGTAACT 


GCATACTCCT 


CAGGTGACAG 


GGTCTTTTTC 


11880 


AATTCCTCAT 


CACTTGGTTT 


TGGATATTTG 


CTGGCATCAA 


TGACAGGATA 


GGCCGCCTGA 


11940 


TTAACATTGA 


TATGGCAGTA 


GCCATTTGGA 


TTTTTCTTGA 


GATAGTCTTG 


ATGGTAATCC 


12000 


TCAGCCACCA 


CAAAATTCTT 


CAAGTTTTCC 


TTTTCAACTG 


CTAGAGGTTG 


ATCGTATTTC 


12060 


TTAGCCACCT 


C AT C AAAG AC 


TTGGTTAATC 


ACTTCCAAAT 


CCTTGTCATC 


TGTGTAATAA 


12120 


ACACCAGTAC 


GGTACTGGGT 


CCCCACATCA 


TTTCCTTGTT 


TATTTTTGCT 


GGTTGGATTG 


12180 


ATAATGCGGA 


AATAGTGAAG 


CAGGATTTCC 


TTGAGAGAAA 


TTTGCTTGGC 


ATCATAGGTG 


12240 


ACATGGACGG 


TTTCTGCATG 


ACCTGTTTGG 


TTAATCAATT 


CGTACTTGGT 


TGTTTCTCCT 


12300 


CTACCATTTG 


CATAGCCTGA 


AACGGC AT C C 


GTCACCCCGG 


GAACACGTGA 


GAAATATTCC 


12360 


TCCACTCCCC 


AGAAACAACC 


TCCAGCTAGA 


TAAATTTCGT 


GCAAGTCTGC 


GTCTTTACTA 


12420 
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ATTTCTGTTT TTTTCACTGC TTTTCCTCCT TGGCTAACTG CCGCCTTTTC AATTTGCGAG 12 480 

GCATCTGTCT GCCCTGCATT TCGTATCAAT AGAACATAGA AACCGGTTAT GGCTAGAAAA 12 540 

AATACTCCTA GCAACAAGAA GATTTTTAAC TT AT CAT T C A TAAGACGCCT CCTAGGCTAA 12 600 

TTCCTTCAAA GTTTGCAAAA TTGCATCTTT TTCCATGAAT CCTGGATGTG TTTTGACCAG 12 6 60 

CTTGCCTTCT TTGTCTATAA AGGCTTGGGT TGGGTAAGAA CGGACACCAT AAGTTTCCAA 1272 0 

AAGTTTGCCT GATGGGTCAA CTAGGACTGG GAGATTTTTA TAATCCAATC CCTTATACCA 12780 

ATTCTTAAAG TCCGCTTCAG ATTGCTCTCC CTTATGTCCT G GTG AC AC T A CTGTCAAGAC 12 840 

CACATAGTCA TCACCAGCTT CTTTAGCAAT C T CAT C CGT A TCTGGAAGAC TAGCCAGACA 12 900 

GATGGAACAC CAAGAAGCCC AGAATTTGAG ATAGACTTTC TTGCCCTTGT AATCAGATAA 12 9 60 

ACGGTAGGTC TTGCCATCTA CTCCCATCAA TTCAAAATCA GCCACCTCTT TCCCTTTAGC 13 020 

TGCGCTTGTT TTACTAGCTG TCTGCTCCGT CTTCATTTCA TCTTTCGTTT GGTGTTCACT 13 080 

AGTCACGGAC TTGCCTGAAC AAGCCGTCAA ACAAAGGAGC GAACCTGCTC CAAGAACACA 1314 0 

TGTTTGCCAT TTTTTCATAT TGATATTCCT TTCCATTTTA TTCAAATAAT TGACTTAAAA 13 2 00 

TTGAAGCATT TCCAAACAGA ACCAAGAAGC CCATCACAAT AATGAGAAAA CCACCCACTT 132 60 

TTTTGAGGAT TCCGAGATAG GGATGAAGTT TTCGGAAATG TTTCAAAACA TAACTAGAGG 1332 0 

TCAGAGCTAG AAGCAAGAAT GGTAGCGCCA AGCCCAGCGT ATACACCAAC ATGAGACCAG 13380 

CTCCCTGCCA AGCTCCTGAA CCACCTGAAG CCGCCAAGGC CAAAACAGAC CCCAGAACCG 13 44 0 

GCCCCACGCA AGGCGTCCAA GCAAAACTAA AGGTCAAGCC CAATAAAAAT GCCTGACTAT 13 5 00 

AGCCCTTACC ATTTTGCCCC TGTCCTTGCA GTTGTAGCCT CTTTTCCTTA TAAAGCCCCT 13 560 

TAAAGTGTAG AATCTCCATT TGGTGCAAAC CAAGAAGGAT AATAATTGCC CCAGTAAGAT 13 62 0 

ATTGGAACCA AGAAGCATAA AGCAAATCGC CTAAAAAACC AG C T CC AT AG CCCAACAAAA 13 6 80 

TAAATATAAA GGAAATTCCT GCTATAAAGG CCAGAGTTCG TAATAAACTA GTAACTGAGA 13740 

TTGAAAATTT GCCGCTAGAA GCCTGAGCAC CATCCTTATC ATCTAGTAAC ACTCCTGTAT 13 800 

AGACCGGTAA CAAAGGTAAG ATACAAGGAG AAAAGAAGGA TAGAATCCCT GCCAAAAAGA 13 860 

C AC T TAG AAA AAAGAAAATA TGACCCATAA AGTTCCTCCT ATCATTTTAT TGATAGATTT 13 920 

ATTATA 13 92 6 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20199 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6: 

CCCAGCAGAA AAATGGCATT TGGAGATAAT GGAAATCGTA AAAAAACTAT GT T TG AG AAA 60 

AT AACCT T GT TTATCGTGAT TATCATGCTA GTAGCAAGTT TATTGGGAAT TTTTGCAACT 120 

GCAATTGGTG CCCTCAGTAA TCTATAAAAT AGATTCAAGA AAATTTAGTG ACTGGGATTT 18 0 

CCCAGCCCTT TTTTAAAGTG AGAAGAAATA ATGAGTATGT T TT T AG AT AC AGCTAAGATT 2 40 

AAGGTCAAGG CTGGTAATGG TGGCGATGGT ATGGTTGCCT TTCGTCGTGA AAAATATGTC 300 

CCTAATGGAG GCCCTTGGGG TGGTGATGGT GGTCGTGGAG GCAATGTGGT CTTCGTTGTA 3 60 

GACGAAGGAC TACGTACCTT GATGGATTTC CGCTACAATC GTCATTTCAA GGCTGATTCT 42 0 

GGTGAAAAAG GGATGACCAA AGGGATGCAT GGTCGTGGTG CTGAGGACCT TAGAGTTCGA 480 

GTACCACAAG GTACGACTGT TCGTGATGCG GAGACTGGCA AGGTTTTAAC AGATTTGATT 54 0 

GAACATGGGC AAGAATTTAT CGTTGCCCAC GGTGGTCGTG GTGGACGTGG AAATATTCGT 600 

TTCGCGACAC CAAAAAATCC TGCACCGGAA ATCTCTGAAA ATGGAGAACC AGGTCAGGAA 660 

CGTGAGTTAC AATTGGAACT AAAAATCTTG GCAGATGTCG GTTTAGTAGG ATTCCCATCT 72 0 

GTAGGGAAGT CAACACTTTT AAGTGT TAT T ACCTCAGCTA AGCCTAAAAT TGGTGCCTAC 7 80 

CACTTTACCA CTATTGTACC AAATTTAGGT ATGGTTCGCA CCCAATCAGG TGAATCCTTT 84 0 

GCAGTAGCCG ACTTGCCAGG TTTGATTGAA GGGGCTAGTC AAGGTGTTGG TTTGGGAACT 900 

CAGTTCCTCC GTCACATCGA GCGTACACGT GTTATCCTTC ACATCATTGA TATGTCAGCT 9 60 

AGCGAGGGCC GTGATCCATA TGAGGACTAC CTAGCTATCA AT AAAG AG C T GGAGTCTTAC 102 0 

AATCTTCGCC TCATGGAGCG T CC AC AG ATT ATTGTAGCTA ATAAGATGGA CATGCCTGAG 1080 

AGTCAGGAAA ATCTTGAAGA CTTTAAGAAA AAATTGGCTG AAAATT AT G A TGAATTTGAA 1140 

GAGTTACCAG CTATCTTCCC AATTTCTGGA TTGACCAAGC AAGGTCTGGC AACACTTTTA 12 00 

GATGCTACAG CTGAATTGTT AGACAAGACA CCAGAATTTT TGCTCTACGA CGAGTCCGAT 12 60 

ATGGAAGAAG AAGCTTACTA TGGATTTGAC GAAGAAGAAA AAGCCTTTGA AATTAGTCGT 132 0 

GATGACGATG CGACATGGGT ACTTTCTGGT GAAAAACTCA TGAAACTCTT TAATATGACC 1380 

AACTTTGATC GTGATGAATC TGTCATGAAA TTTGCCCGTC AGCTTCGTGG TATGGGGGTT 1440 

GATGAAGCCC TTCGTGCGCG TGGAGCTAAA GATGGGGATT TGGTCCGCAT TGGTAAATTT 1500 

GAGTTTGAAT TTGTAGACTA GGAGACTGGT ATGGGAGATA AACCGATATC TTTCCGAGAT 15 60 

GCGGATGGTA ATTTTGTTTC CGCCGCAGAC GTTTGGAATG AAAAGAAATT GGAAGAACTA 162 0 
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TTTAATCGTC TCAATCCAAA TCGTGCCTTG AG AT TGG C AC GAACTAAAAA GGAAAATCCA 168 0 

TCTCAGTAAA GAAGCTAAAA AATCCCGTGC C T C AT C AG AC ACGGGATTTT GTGGTACGAC 1740 

AGGCATGTAT AGCAAACTGA ATCTGGAATA GCACAGCATA TCTTCTAAAA TATAGTAAAA 1800 

TGAAATGAGA ACAGGACAAA TCGATCAGGA CAGTAAAATC GATTTCTAAC AATGTTTTAT 18 6 0 

AAG C AG AG AT GTACTATTCT AGTTTCAATC AACTATATTG TTATAAATTG ATTTGAATTT 1920 

CAAAATTAAA TTGTTTGATT CTTATTTCAA TTTGTTATAG TATATCTGAT GTCAAAGTTC 19 8 0 

TCGGCGAGTC AAATAGCGAT TCCCAAGCCT GACTATCGTG AGGTAGCGGA TTAAAATGGT 204 0 

C TGGGG AT AG ACCGTTTTAA GTCTGACGCT GGAAATAAGA ATTGTCAGAA GAAGGGATAG 2100 

CGAAATCGTG GCTCTACGAA CAGGAACGTG ATAATAAGGC GTATATAGCG GAT AAG AG GG 2160 

CATCAAACTC TAAAGTCCAA AAAGGTAGTC GTAACCTATA TGCGTAAATC ACGAGAGTAA 2 22 0 

TTGAATTCGT ACTAAGATTT TCTATTTTCA CTGTAACCTT TTAACGCCCT TATATCTTGT 22 80 

ATACACGAGG AAAGATGTAC GACTTATCCC GTGAGGTCTA TCACTATAAA GAGAAAACGA 234 0 

CAGATAGAAG TGATCCTGAG TCACGGTTAT CTGTCTGATA GGACGGTATG TATAAAACGC 2400 

TTCTGTGAAC TGAGAGAAGG GGGAGAAGTT CTTGCTAAAA TTTAGTTGAA CAGCCGTATT 24 60 

CCG AT AC TT A GATAAGAGAT CTAGTCTTAG CTCCTACTCA GTTTTAGGGG ATAAAAAAGG 2 52 0 

GGCAATAGCG ATTCGAGAAA GATTATACTC TTCGAAAATC TCTTCAAATC ACGTCAATAT 25 80 

CGCCTTGTCG TATGTGTAGG ATACTGACTA CGTCAGTTCC ATCTACAACC TCAAAACAGT 2 64 0 

GTTTTGAGCA ACcTGCGGCT AGTTTCCTAG TTTGATCTTT GATTTTCATT GAGTATTAGT 27 00 

AATTCAGTTA CTAACTCGTC AACTCTGATT TATCCAATAA AATTGAAAAG GATGGAAAAA 2760 

AGGATAAATT TATGATATAC T TT ATTTTG A AGACCTTATT AG AAAT CTTG AAAGAGTATT 2 82 0 

GAAAACTTAG AATGAGAAAA ATTGTTATCA ATGGTGGATT ACCACTGCAA GGTGAAATCA 2 880 

CTATTAGTGG TGCTAAAAAT AGTGTCGTTG CCTTAATTCC AGCTATTATC TTGGCTGATG 2 940 

ATGTGGTGAC TTTGGATTGC GTTCCAGATA TTTCGGATGT AGCCAGTCTT GTCGAAATCA 3 000 

TGGAAT TGAT GGGAGCTACT GTTAAGCGTT ATGACGATGT AT TGG AG ATT GACCCAAGAG 3 0 60 

GTGTTCAAAA TATTCCAATG CCTTATGGTA AAATTAACAG TCTTCGTGCA T CTT ACT AT T 312 0 

TTTATGGGAG CCTCTTAGGC CGTTTTGGTG AAGCGACAGT TGGTCTACCG GGAGGATGTG 3180 

ATCTTGGTCC TCGTCCGATT GACTTACACC TTAAGGCGTT TGAAGCTATG GGTGCCACTG 3240 

CTAGCTACGA GGGAGATAAC AT G AAGTT AT CTGCTAAAGA TACAGGACTT CATGGTGCAA 3 300 

GTATTTACAT GGATACGGTT AGTGTGGGAG CAACGATTAA TACGATGATT GCTGCGGTTA 33 60 

AAGCAAATGG TC GT ACT ATT ATTGAAAATG CAGCCCGTGA ACCTGAGATT ATTGATGTAG 342 0 
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CTACTCTCTT 


GAATAATATG 


GGTGCCCATA 


TCCGTGGGGC 


AGGAACTAAT 


AT CAT CAT T A 


3480 


TTGATGGTGT 


TGAAAGATTA 


CATGGGACAC 


GTCATCAGGT 


GATTCCAGAC 


CGCATTGAAG 


3540 


CTGGAACATA 


TATATCTTTA 


GCTGCTGCAG 


TTGGTAAAGG 


AATTCGTATA 


AATAATGTTC 


3600 


TTTACGAACA 


CCTGGAAGGG 


TTTATTGCTA 


AGTTGGAAGA 


AATGGGAGTG 


AGAATGACTG 


3660 


TATCTGAAGA 


C AG C ATT T T T 


GTCGAGGAAC 


AGTCTAATTT 


GAAAGCAATC 


AATATTAAGA 


3720 


CAGCTCCTTA 


CCCAGGCTTT 


GC AAC TG ATT 


T G C AAC AAC C 


GCTTACCCCT 


CTTTTACTAA 


3780 


GAGCGAATGG 


TCGTGGTACA 


ATTGTCGATA 


CGATTTACGA 


AAAACGTGTA 


AATCATGTTT 


3840 


TTGAACTAGC 


AAAG AT GG AT 


GCGGATATTT 


CGACAACAAA 


TGGTCATATT 


TTGTACACGG 


3900 


GTGGACGTGA 


TTTACGTGGG 


GCCAGTGTTA 


AAGCGACCGA 


CTTAAGAGCT 


GGGGCTGCAC 


3960 


TAGTCATTGC 


TGGGCTTATG 


GCTGAAGGTA 


AAACTGAAAT 


TACCAATATC 


GAGTTTATCT 


4020 


TACGTGGTTA 


TTCTGATATT 


AT CG AAAAAT 


TACGTAATTT 


AGG AG CGG AT 


ATTAGACTTG 


4080 


TTGAGGATTA 


AACCGTAGAG 


GTGTTTATGA 


ATATTTGGAC 


CAAATTAGCA 


ATGTTTTCTT 


4140 


TTTTTGAAAC 


GGATCGCTTG 


TATTTGCGTC 


CTTTCTTTTT 


TAGTGATAGT 


CAGGACTTCC 


4200 


GCGAGATAGC 


TTCAAATCCA 


GAAAATCTTC 


AATTTATTTT 


CCCAACGCAG 


GCAAGTCTGG 


4260 


AAGAAAGTCA 


AT ATGC AC TG 


GCCAATTACT 


TTATGAAGTC 


CCCTTTGGGA 


GTGTGGGCAA 


4320 


TTTGTGACCA 


GAAAAATCAA 


CAAATGATTG 


GTTCTATTAA 


ATTTGAGAAG 


TTAGATGAAA 


4380 


TCAAAAAAGA 


AGCTGAGCTT 


GGCTATTTTT 


TGAGAAAAGA 


TGCTTGGTCG 


CAAGGATTTA 


4440 


TGACAGAGGT 


TGTTAGAAAA 


ATTTGTCAGC 


TTTCTTTTGA 


GGAATTTGGC 


TTAAAACAAT 


4500 


TATTTATCAT 


TACCCACCTT 


GAAAATAAAG 


CTAGCCAAAG 


AGTTGCTCTT 


AAGTCTGGAT 


4560 


TTAGTTTGTT 


CCGTCAGTTT 


AAGGGAAGTG 


ATCGTTACAC 


AAGAAAAATG 


CGGGATTATC 


4620 


TTGAATTTCG 


GTATGTAAAA 


GGAGAGTTCA 


ATGAGTAAGC 


ATCAGGAAAT 


TCTAAGCTAT 


4680 


TTGGAGGAAT 


TACCAGTAGG 


TAAAAGGGTC 


AGTGTTCGTA 


GCATTTCGAA 


TCATCTAGGA 


4740 


GTTAGTGATG 


GAACAGCCTA 


TCGGGCTATT 


AAAGAAGCTG 


AAAACCGTGG 


AATTGTGGAG 


4800 


ACCCGTCCTA 


GAAGTGGAAC 


AATTCGTGTT 


AAATC C C AG A 


AAGTTGCTAT 


AGAGAGATTA 


4 860 


ACGTTTGCTG 


AAATTGCAGA 


AGTGACTTCT 


TCTGAGGTTC 


TGGCTGGGCA 


AG AAGGTT T A 


4920 


GAGAGAGAAT 


TTAGTAAGTT 


TTCAATTGGT 


GCCATGACTG 


AACAAAATAT 


CTTGTCTTAC 


4980 


CTTCATGATG 


GGGGGCTCTT 


GATTGTCGGA 


GACCGAACCC 


GTATTCAGTT 


GCTAGCCTTG 


5040 


GAAAATGAAA 


ATGCAGTTCT 


GGTTACAGGG 


GGATTTCAGG 


TTCATGATGA 


TGTGCTTAAA 


5100 


CTGGCCAATC 


AAAAAGGGAT 


TCCTGTTCTA 


AGAAGTAAGC 


ATGATACCTT 


TACCGTCGCG 


5160 
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ACC AT GAT C A ATAAAGCCTT GTCAAATGTC CAAATCAAGA CTGATATTCT GACAGTTGAG 522 0 

AAACTT TAT C GCCCTAGTCA TGAGTATGGT TTTCTGAGAG AGACAGATAC AGTTAAAGAT 52 80 

TAT T TGG ACT TGGTTCGTAA GAATCGTAGC AGCCGTTTCC CTGTTATCAA TCAACATCAG 534 0 

GTCGTTGTTG GTGTTGTAAC CATGAGAGAC GCTGGTGATA AATCACCAAG CACGACAATT 5400 

GATAAGGTTA TGTCTCGTAG TCTATTTTTG GT T GG ATT AT CGACAAATAT TGCCAATGTG 54 60 

AGTCAACGGA TG AT CGC AG A AG AC TTTG AA ATGGT AC C AG TTGTTCGAAG CAATCAAACT 552 0 

TTGCTTGGCG TTGTGACGCG ACGAGATGTC ATGGAGAAGA TGAGCCGTTC CCAAGTTTCG 558 0 

GCTCTACCAA CTTTTTCTGA GCAGATTGGA CAAAAGCTCT CTTATCACCA TGATGAAGTA 564 0 

GTCATTACAG TGGAACCCTT TATGCTAGAA AAAAATGGAG TTTTGGCTAA TGGTGTATTG 57 00 

GCAGAAATTC TG AC C C AC AT GACCCGATTT AGTTGTTAAT AGTGGTCGCA AT CT C ATT AT 57 60 

CGAGCAGATG CTGATCTACT TTTTGCAGGC TGTTCAGATA GATGATATAT TGCGCATTCA 582 0 

GGCACGGATT ATTCATCATA CGAGACGGTC AGCTATAATT GATTACGATA TTTATCATGG 58 8 0 

TC ACC AG ATT GTTTCAAAAG CAAATGTGAC TGTTAAAATT AATTAGAAAC TAGGAGAAAA 5940 

GATGATAACA TTAAAATCAG CTCGTGAAAT CGAAGCTATG GACAAGGCTG GTGATTTTCT 6000 

AG C AAGT AT T CATATAGGCT TACGTGATTT GATTAAGCCA GGCGTAGATA TGTGGGAAGT 6060 

TGAAGAATAT GTCCGCCGTC GTTGTAAAGA AG AAAATTT C CTTCCACTTC AGATTGGGGT 612 0 

TGACGGTGCC ATGATGGACT ATCCTTATGC TACCTGTTGC TCTCTTAACG ATGAAGTGGC 6180 

TCACGCTTTC CCTCGTCATT ATATCTTGAA AG ATGGT GAT TTGCTCAAAG TTGAT ATGGT 624 0 

TTTGGGAGGT CCCATTGCTA AATCTGACCT AAATGTCTCA AAATTAAACT TCAACAATGT 63 00 

TGAACAAATG AAAAAATACA CTCAGAGCTA TTCTGGTGGT TTAGCAGACT CATGTTGGGC 63 60 

TTATGCTGTT GGTACACCGT CCGAAGAAGT CAAAAACTTG ATGGATGTAA CCAAAGAAGC 642 0 

TATGTACAAG GGTATTGAGC AAGCTGTTGT TGGAAATCGT ATCGGTGATA TCGGTGCGGC 648 0 

TATTCAAGAA TACGCTGAAA GTCGTGGTTA CGGTGTAGTG CGTGATTTGG TTGGTCATGG 654 0 

TGTTGGCCCA ACTATGCACG AAGAACCAAT GGTTCCTAAC T ATGGT ATTG CAGGTCGTGG 6600 

ACTCCGTCTT CGTGAAGGAA TGGTCTTAAC CATTGAACCA ATGATCAATA CAGGCGATTG 6660 

GGAAATTGAT ACAGATATGA AAACTGGTTG GGCGCATAAG ACC ATTG ACG GTGGATTGTC 672 0 

ATGTCAGTAT GAACACCAAT TTGTCATTAC GAAAGATGGA CCTGTTATCT TGACTAGCCA 6780 

AGGTGAAGAA GGAACTTATT AATAAAAAGT GAAAAGACTA CTGGAAGTTT ATTTTGATAA 684 0 

AAAATCCAGT AGATCTTTTC ATAATAAAAC GCATTGTATC AAGTGTTAGG GGCTGATATC 6900 

ATGCGTTTTT CTGCTTTTAA GATTTTTTCC AACTCTGTTT GTAAGCGCAT CATAACAAAG 6960 
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GGTCTAGGAT TCAGGGCTCT CCTCCTATAT ACT AT TAG T A AAGTAAAACT AAGGGAGGAT 7 02 0 

ATTTTAGTGT CGCAGTCTAT TGTTCCTGTA GAGATTCCAC AATATTGTCG TTTTGATTCT 7 080 

AAAAAGAGAA ATGGAATTCT GTTTAATGTT CGTATTGCCA ATCTTAAATT TACTTTTTTA 714 0 

TATTATACTT CCTGCGAAAC AAAATATGGT ATAGTAGTTC TATGAATGAT GAAGCAAGTA 72 00 

AACAACTAAC TGATGCACGA TTTAAGCGTC TTGTTGGTGT TCAGCGTACC ACTTTTGAAG 72 60 

AGATGTTAGC TGTATTAAAA ACAGCTTATC AACTTAAACA CGCAAAAGGT GGACGAAAAC 7320 

CTAAATTAAG CCTAGAAGAC CTTCTTATGC CCACTCTTCA ATAGTGCGAG AATATCGAAC 73 80 

TTATGAAGAA ATTGCGGCTG ATTTTGGTAT TCACGAAAGC AACTTTATCC GTCGGAGCCA 7 440 

ATGGGTTGAA ATAACTCTTG TTCAAAGTGG TTTTACGGTT TCAAGAACTC CTCTCAGTTC 7 500 

TGAGGACACG GTAATGATTG ATGCGACGGA AGTAAAAATC AATCGCCCTA AAAAAACAAT 7 5 60 

TAGCGAATGA TTCTGGTAAA AAGAAATTTC ACGCTATGAA GGCTCAAGCG ATTGTCACAA 7 62 0 

GTCAAGGGAG AATTGTTTCT TTGGATATCG CTGTGAACTA TAGTCATGAT ATGAAGTTGT 7680 

T C AAAATG AG TCGTAGAAAT ATCGAACAAG CTGGTAAAAT CTTGGCTGAC AGTGGTTATC 774 0 

AAGGGCTCAT GAAGATATAT CCTCAAGCAC AAACTCCACG TAAATCCAGC AAACTCAAGC 7 800 

CGCTAACAGC TGAAGATAAA GCCTATAACC ATGCGCTATC TAAGGAAAGA AGCAAGGTTG 7 8 60 

AGAACATCTT TGCCAAAGTA AAAACGTTTA AAATATTTTC AACAACCTAT CG AAAT CAT C 7 92 0 

GTAAACGCTT C GG ATT AC G A ATGAATTTGA GTGCTGGTAT TATCAATCAT GAACTAGGAT 7 980 

TCTAGTTTTG CAGGAAGTCT ATTGAGGTAT TGAGCTAGTT TATGAAAAAA TTGGGTGAAA 8040 

AGTCGAGTGT TTTAGAAACC CACAGTGTAG TATTCTAGTT TCAATCCACT ATATTTTGCT 8100 

ACTCCCCGTA AAGTTTCTAT TTTCCCTGAT TTCTGATATA ATAGAAATAT TGACTTCAAG 8160 

AGTAAGGAAG AGAAGATGAA CGCATTATTA AATGGAATGA ATGACCGTCA GGCTGAGGCG 8220 

GTGCAAACGA CAGAAGGTCC CTTGCTAATC ATGGCAGGGG CTGGTTCTGG AAAGACTCGT 82 80 

GTTTTGACCC ACCGTATCGC TTATTTGATT GATGAAAAGC TGGTCAATCC TTGGAATATC 8 34 0 

TTGGCCATTA CCTTTACCAA CAAGGCTGCG CGTG AG AT G A AAG AG CGTGC TTATAGCCTC 8400 

AATCC AG CG A CTCAGGACTG TCTGATTGCG ACCTTCCACT CCATGTGTGT GCGTATTTTG 84 60 

CGTCGCGATG CGG AC CAT AT TGGCTACAAT CGTAATTTTA CAATTGTGGA TCCTGGTGAA 852 0 

CAGCGAACGC TCATGAAACG TATTCTCAAA CAGTTGAACT TGGACCCTAA AAAATGGAAT 8580 

GAACGAACTA TTTTGGGGAC CATTTCCAAT GCTAAGAATG ATTTGATTGA TGATGTTGCT 8640 

TATGCTGCCC AAGCTGGCGA TATGTATACG CAAATTGTGG CCCAGTGTTA TACAGCCTAT 8700 
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CAAAAAGAAC TTCGTCAGTC TGAATCCGTT GACTTTGATG AT T T G ATT AT GCTGACCTTG 87 60 

CGTCTCTTTG ATCAAAATCC TGATGTTTTG ACCTACTACC AG C AAAAAT T CCAATACATC 882 0 

CACGTTGATG AGTACCAAGA TACCAACCAC GCTCAGTACC AATTGGTCAA ACTCTTGGCT 88 80 

TCCCGTTTTA AAAATATCTG TGTGGTTGGG GATGCGGACC AGTCTATCTA CGGTTGGCGT 8940 

GGTGCTGATA TGCAGAATAT CTTGGACTTT GAAAAGGATT ACCCCAAAGC CAAGGTTGTT 9000 

TTGTTGGAGG AAAATTACCG CT C AACC AAA ACCATTCTCC AAGCGGCCAA CGAGGTTATT 9060 

AAAAAT AATA AAAATCGCCG TCCTAAAAAT CTCTGGACTC AAAACGCTGA TGGGGAGCAA 912 0 

ATCGTTTACT ATCGTGCCGA TGATGAGCTG GATGAGGCTG TATTTGTAGC CAGAACCATC 9180 

GATGAACTTA GTCGCAGTCA AAACTTCCTT CATAAGGATT TTGCAGTTCT CTATCGGACT 924 0 

AATGCCCAGT CCCGTACAAT TGAGGAAGCC CTGCTCAAGT CTAACATTCC TTATACCATG 93 00 

GTTGGCGGAA CCAAATTCTA CAGCCGTAAG GAAATTCGCG ATATTATTGC TTATCTCAAC 93 60 

CTTATTGCTA ATTTGAGTGA CAATATTAGT TTTGAGCGTA TTATCAACGA GCCTAAACGT 9420 

GGAATTGGTC TAGGTACAGT TGAGAAAATC CGTGATTTTG CAAATTTGCA AAATATGTCT 94 80 

ATGCTGGATG CTTCTGCTAA TATTATGTTG TCTGGTATCA AGGGTAAGGC AGCCCAATCT 9540 

ATCTGGGATT TTGCCAATAT GATGCTTGAT TTGCGGGAGC AGCTAGACCA CTTAAGCATT 9 60 0 

ACAGAGTTGG TTGAGTCCGT CCTAGAAAAA ACAGGTTATG T CG AT ATT CT TAACTCCCAA 9 660 

GCGACTCTAG AAAGCAAGGC ACGGGTTGAA AATATCGAAG AGTTTCTTTC TGTTACGAAG 97 2 0 

AACTTTGATG ACACCACGGA TGTGACAGAA GAGGAAACTG GTCTGGACAA ACTGAGTCGT 9780 

TTCTTAAATG ACTTGGCTTT GATTGCCGAC AC AG AT T C AG GTAGTCAGGA G AC AT C AG AA 9 84 0 

GTGACCTTGA TGACCCTGCA TGCTGCCAAA GGTCTCGAAT TTCCAGTTGT CTTTTTGATT 99 00 

GGGATGGAAG AAAATGTCTT TCCACTTAGT CGTGCGACTG AAGATTCAGA TGAATTAGAA 9960 

GAAGAGCGCC GTCTAGCCTA TGTAGGTATC ACGCGTGCAG AGAAAATTCT CTATCTGACC 1002 0 

AATGCCAACT CACGCTTGCT TTTTGGTCGT ACCAATTATA ACCGTCCGAC TCGTTTTATT 10080 

AACGAAATCA GTTCAGACTT GCTTGAGTAT CAAGGTCTGG CTCGTCCTGC AAATACAAGC 1014 0 

TTTAAGGCAT CATATAGCAG TGGTAGTATT TCCTTTGGTC AAGGTATGAG TTTGGCTCAG 102 00 

GCTCTTCAAG ACCGTAAACG CGGTGCTGCC CCAAAATCAA TCCAGTCAAG CGGTCTTCCA 102 60 

TTTGGTCAAT TTACAGCTGG CGCAAAACCA GCATCTAGCG AGGCAAATTG GTCCATTGGT 103 2 0 

GATATTGCTC TCCACAAGAA ATGGGGAGAG GGAACCGTTC TGGAAGTTTC AGGTAGCGGT 103 80 

GCTAGGCAGG AATTGAAAAT CAATTTCCCA GAAGTAGGTT TGAAAAAACT TTTAGCCAGT 10440 

GTGGCTCCAA TTGAGAAAAA AATCTAATTT TCCATCCTTC TCACGAATAA TAAAGTGAGG 10500 
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AGGATTTTTA TGTACAGTAT TTCATTCCAA GAAGATTCAC TATTACCAAG AG AAAGGC T G 10 560 

GCCAAGGAAG GAGTTGAAGC GCTTAGTAAC CAAGAGTTGC TAGCTATTTT ACTCAGGACA 10 620 

GGAACACGTC AAGCTAGCGT TTTTGAAATT GCCCAAAAAG TCTTGAACAA TCTTTCAAGC 10 680 

CTAACGGATT TGAAAAAAAT GACCCTGCAG G AATTG C AG A GTTTGTCTGG TATTGGGCGT 10740 

GTTAAGGCCA TAGAATTACA AGCTATGATT GAACTGGGGC ATCGTATTCA CAAACACGAG 10 800 

ACTCTTGAAA TGGAAAGTAT TCTCAGCAGT CAAAAGTTGG CCAAGAAGAT GCAGCAGGAA 108 60 

TTAGGGGATA AAAAACAAGA GCACCTGGTG GCACTCTATC TCAATACTCA AAATCAAATC 10920 

ATCCATCAGC AGACCATTTT TATCGGGTCT GTAACTCGTA GTATCGCTGA ACCGCGAGAG 109 80 

ATTCTTCACT ATGCAATCAA GCATATGGCG ACTTCTCTTA TCTTGGTCCA CAATCATCCT 11040 

TCAGGAGCGG TAGCGCCTAG C C AAAATG AT GATCATGTCA CTAAACTTGT TAAAGAAGCC 11100 

TGCGAATTGA TGGGGATTGT TCTCTTGGAC CAT T TG AT T G TCTCTCATTC TAATTACTTT 11160 

AGTTATCGTG AAAAGACAGA TTTAATCTAA AGTTCATTAA CGACATAGTC AAAGAGTTTT 1122 0 

TTATCTTTGG GACGATTTTC AAAAAGAAGT TCTGGATGCC ATTGGACACC GAG AAAGGC G 112 80 

ACATCATCCG TACTCATGAC AGCCTCAATG ATACCATCTT TAGGATCATG AGCCACAACT 11340 

TTTAAATTTG GTGCTAAGTC CTTGATGCTC TGGTGGTGGA AGGAGTTGAT ATGAGAGATT 11400 

TCTCCATAGA TTTCTTGGAG AACGGTATCT GGTTCTGTTA CCAAGCGTTG AGTTGTGTAC 114 60 

TCAACAGAAG AATCCTGCCA ATGGTCTTCG ATATCTTGGT ACAAAGTTCC ACCCATGGCA 11520 

ACGTTAAAGA GTTGGGTACC ACGGCAGACA GAGAAAATGG GCTTTTTCTG TTTAATAGCT 1158 0 

TCCTTGATGA GGGCCAGTTC GAAGATATCT CTTTGAAGGT GATAGTCATC AC TAT C AATG 11640 

GTTTTGGGTT CGCCATAAAA TTTTGGATCG ACATTTTGCC CACCTGTCAA GATGAGCTTG 11700 

TCAATCAAAC TGATATAGTG GCAGGCCATT TCTTGATCAC CAATCGGTAG GATGATGGGA 117 60 

ATCCCTCCAG CATCTTTAAC GCCTTCAACA AAGCCTTTTG CTGCGTAGCT CATCATGATG 11820 

TCATCATCTG GATGAGTTTT TTCGTTTCCT GTAATCCCAA TAACTGGTTT TTTCATAAAA 118 80 

TGATTTTCGC TTTCTAATCC TCTTTTCGCA TGAAGTAGAG GAGGGTTTGG AGTTCACTTG 11940 

TCAAATCGAC ATACTGAACG ACCACGTCTT TTGGTAAATG C AG ATGG AC T GGTGAAAAAC 12 00 0 

TGAGAATTCC TTTCACACCA GCATCAACCA AGAGATTAGC AACCTCTTGT GACTTGACGC 12060 

TGGGAACAGT TAGGATAGCA GTCTTCACAT CAGCATCCTT GATTTTATCC TTGATCTGAG 12120 

AAATCCCGTA AATGGGAAT C CCGTCAGGAG TTTGGGT AC C GACTTCAGGA TGGTCGTCTA 12180 

GGTC AAAGGC C ATG AT AAT C TTCATCTTGT TACGTTCGTG GAAGCGGTAG TGGAGAAGGG 122 4 0 
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CATGGCCCAT ATTTCCAATA CCAACCAGCA TGACATTGGT AATAGAGTTG T C AT TG AG C A 123 00 

AATCGGCAAA AAATGTCATT AGTTTTTTGA CATC AT AG CC AAAAC C AC G A CGACCAAGTT 123 60 

CACCAAAATA GGAAAAATCA CGACGTACGG TCGCTGAATC AATACCGATA GCCTCTGCAA 1242 0 

TTTGCTTAGA GTTGGCACGT TCAATCTTTT CTGCATGAAA TCTCTTAAAA ATTCGATAGT 12 4 80 

AGAGAGAGAG TCTTTTTGCT GTAGCTTTTG GAATAGCAAA CTGTTTATCT TTCACAAAAT 12 540 

CACAACCTTT CTATTCTTCT ATTTTATAGA AACATTGTGA AAAAATCAAC AAAAATAAGA 12600 

AAAAACTAAG AAAAATCTTA GTTTTGATGT AAAAAATCTG CATGAGATAG AAAACGGTAG 12 660 

AGGTCTCCGA CCAGCCCCTG ATAAACTTTT TTGCCCCTAA AAGTCAGAGA AGTCACATAA 1272 0 

AGTGTATCTG GTAAGGTTAC AC AT CCT G AC AAAGTCAACA TGAGAGCCTC ATGATCCTCA 127 80 

TACTTGAGAG TACGCTCTAC ATGATAGCAG TCCTTATAGG TCAGTTCAAA CATTTTGGCT 1284 0 

CTATCTTTCC GATTTTGTAA AG AC AC C ACG TTCTACCAAG CTATCCATGA GGAAGTAGAA 12 900 

TTTTTCCTGA TGAATATGGT GGTCTTCTGA TTTGAAAATA TCAACTAGAC GAAGGCCAAA 12 9 60 

CTTGTCAGTG ATATTGATTT TAGCCCCTGT AAGTTCCTTG TTAATGATGA TTTTGAGTTG 1302 0 

GAAGCCTTCA CCGCTGTTTG GCACTTTTTC CAAAAGGCGA GTCAGTTCAT AGTTACCAAC 13 080 

CTTAGTTTCA AAAAAGGTGT TATCTTTGAG GGTGAATTTT TTAACAGAAG GGCTAAGAGT 1314 0 

GTAATCGTAA CGACAATTTT TTAACTGAAT GATTTTTTCA AAT G C CAT AT GGCTAACCTC 13 200 

CGATAATTTC TTTTAAGGTT TTTGCGAGGG TTTGTAGGTC TTCAACGGTA TTTTGTGGCG 132 60 

ACAAACTGAT GCGAAGGGAT TCCTTCAAGC GTTCTGAATT TGCGCCATAC ATGGCTTCAA 133 2 0 

GAACATGGCT GGATTGGACA ACGCCTGCAG TACAGGCTGA GCCAGTAGAG ATTGAAATTC 133 80 

CAGCTAAATC TAGCCGAAGG AGTAAGAGGT CATTTTTCTG ACCAGGAAAT CCAATATTGA 1344 0 

GAACATAAGG GAGATGATGT TTTCCTCTAT TCAGGTAATA CTGAATGCCC TCCAGCTCTG 13500 

CCAGAAAGGC AGTTTCTAGA TTTTGTACAT GTTG AAAAT G TTCTTCTTGT TTTTCTAGGT 13 560 

CTTCTTTTAG GGCTGCAACC ATGCCTACAA TGGCAGGCAG ATTTTCAGTT CCTGCACGTT 13 62 0 

TTTTCTGTTC CTGGTCTCCG CCATGTAGAT AGGAATCAAA GTCCATGCTA GATGCGTAGA 13 680 

GAAAACCGAT TCCCTTAGGA CCATGGAATT TGTGGGCAGA AGCAGTGAGA AAATCAATGC 1374 0 

CCAATTCTTC TGAATGAATT GGGATTTTAC CAATAGCCTG AACTGCATCA ACATGATAGG 13 800 

CAGCAGGGTG TTGCTTGAGT ATTTGGCCAA TTTCAGCGAT GGGCAGTAGG TTTCCTGTCT 13860 

CATTATTGAC AAACATGGTA GAAACCAAAA TCGTATCGTC ACGTAAAGCC TTTTGAATTT 13 92 0 

GCTGGGCTGT GATTTCTTGA TTTTCTGGCT GGATAATGGT TGCTTCAAAC CCAAAGTGTT 13 980 

GAACCAAGTA ATCAATTGTT TCAAGGACAG CATGGTGCTC GATGGCAGTT GTGATGATAT 1404 0 
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GTTTTCCTTG TTCTTGGTGA CGAAGACAGT AGCCAATGAT GGT AG T AT T A TTGCCTTCAG 14100 

TCCCACCAGA AGTGAAAAAG ATATGTTGAG GTTTTGTCCT TAGTAACTGG GCTAGTTCCT 1416 0 

GACGGGCTTC TCGCAAGAGT TTGCCAGCTT GACGACCATG ACCATGAATA CTAGAAGGAT 14220 

TTCCGTGGGT TTCTTGCATA ACCTTGGTCA TAGCTGAAAT AGCAACTGCT GACATAGGAG 142 80 

TCGTTGCAGC ATTGTCCAAA TAAATCAAAG AATCACCTTA TTTCTTTTTA TTGTAGGCAA 14340 

AGAGTGGGCT GACTGGTTTT CTTTCGTGAA TACGGACGAT AGCATCACCA ATTAACTCAC 14400 

TAGCAGTGAT GTAGCATACA TTTTTAGGAG TTTTTTCTTT TGTTGCTACT GAATCAGTCA 144 60 

CAAGAATTTC TTTAATATTA GTATTGTCAA GAAGCTCAGC AGCTCCCTCG ACGAAGAGAC 14520 

CGTGGCTAGA AACAGCATAA ATTTCTGTAG CTCCTTCACG TTCAACGATT TTAGAAGCTT 14 58 0 

CAGAGAAGGT ACGTCCTGTA TTTAAAATAT CATCAATCAA GATAGCTTTC TTACCTTCAA 14 640 

CAT C AC C AAT AATATAACCT TCGTTACGAG TTGCATCGTC TTGAGGGTAG TCGATAATGG 147 00 

CGATAGGAGC ATCAAGATAT TCAGCCAGGC TACGCGCACG TTTGACACCT GAATTTTTAG 147 6 0 

GGCTAACGAC AACAACATCT GAACCAAGCA ATCCTTTATC GCAGTAATGT TTTGCGAATA 14 82 0 

GGGGAACAGT G AAAAG AT T A TCCACTGGAA TATCAAAGAA ACCTTGAACC TGAACGGCAT 14880 

GCAAATCAAG AGTCAGGATA CGATCAACTC CAGCCTTAAC CAGCATATTG GCAACTAGTT 14 940 

TTGCTGTAAG TGGCTCACGA GGACAAGCAA TGCGGTCTTG ACGTGCATAG CCAAAATATG 15000 

GAAGGACAAC GTTGATACTG TGGGCACTTG CACGCACACA AGCATCGACC AT GAT T AAC A 150 60 

ATTCCATTAG GTGGTTGTTG ACAGGGAAAC TTGTTGATTG GATGATGTAA ACATCATAAC 1512 0 

CACGGACACT TTCTTCGATA TTTACTTGGA TTTCTCCGTC TGAAAATTGA CGTGATGATA 15180 

GTTTTCCAAG TGGG AC AC C A ACAGCTTGGG CAATTTTTTG TGCAATCTCT TGGTTAGAGT 15240 

TGAGTGCGAA AAGTTTCATG TTTTTTCTAT CTGACATTAT AGACCGTCCT CTGTAAACTT 153 00 

TATAAATCCT AGTTATATTT ACCTTACATA TATGAACTGG GATTTGTGTA TTTTTATCTT 153 60 

TTCTATTTTA CCAAAAAATG GAGATTATTT CAGCTATTTT TCATACTTTT GACAAATCGA 15420 

ACCAATTTTG AAGGAGCTTT TTGATAGGAA ATCTGATTTT TCTCTAAAAA TTGTCGAAAA 154 80 

TCCTGTTTGC CTTGCTCATG ATTTTCCACT TCAAGCTCCA ATTCGTAATC TGTTATATCA 15540 

AAGTATCGGC TCTGATCCAG TGCCATGAGA CCAATAGCTG TTTTCATTTC ATAGCGAAGC 15600 

GTTGTTAGAC AACCAAGAAC CTGCCAGTTC TTACTTTGGA TACCATGTTT CGCCAATTCA 15 6 60 

TCCAGTACTA GCCCTTGAGG AAGTTCTTCC TT ACT C AG AT AGTTCTCAGC ATCTTTTAGT 15720 

TGCAATTTTT GGTTGTATTC CATGTTTCCA ACACTCTGCG GGACTTTGAG TGTCAACTCA 15780 
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GCCCAGTCTT CAAAGGTTCG AATGCGCATA GCGACTTTCT TTTCTCGCAG TTCAAAATCA 15 840 

GGCGTGTCGA TGTAGTAATT TGTTTGAAGA ACAGGAGTGA CACCTGTGAA CTGGTCTTTT 15 900 

AGACGATTGT ATTCATCTTT TTTCAATAGT GTTTTCAATT CAATTTCTAA ATGTTTCATT 159 60 

TTTCTTACCT TTTTTTATCG TTGAAAGCGG ATTT AT GGT A TAATAAGCAT TGTATTTATT 16020 

GTATATGAAT CTGGAGAAAA AATCAAAGAT ATTTTTGACG GATAATATGA GAACAAGGGA 16080 

GAATATATGA CCTTAGAATG GGAAGAATTT CT AG AT C C T T ACATTCAAGC TGTTGGTGAG 16140 

TTAAAGATTA AACTTCGTGG TATTCGTAAG CAATATCGTA AGCAAAATAA GCATTCTCCA 162 00 

ATTGAGTTTG TGACCGGTCG AGTCAAGCCA ATTGAGAGCA TCAAAGAAAA AATGGCTCGT 16260 

CGTGGCATTA CTTATGCGAC CTTGGAACAC GATTTGCAGG ATATTGCTGG CTTACGTGTG 16320 

ATGGTTCAGT TTGTAGATGA CGTCAAGGAA GTAGTGGATA TTTTGCACAA GCGTCAGGAT 163 80 

ATGCGAATCA TACAGGAGCG AGATTACATT ACT CAT AG AA AAGCATCAGG CTATCGTTCC 1644 0 

TATCATGTGG TAGTAGAATA TACGGTTGAT ACCATCAATG GAGCTAAGAC TATTTTGGCA 16500 

G AAATT C AAA TTCGTACTTT GGCCATGAAT TTCTGGGCAA CGATAGAACA TTCTCTCAAC 165 60 

TACAAGTACC AAGGGGATTT CCCAGATGAG ATTAAGAAGC G ACT GG AAAT TACAGCTAGA 16620 

ATCGCCCATC AGTTGGATGA AGAAATGGGT GAAATTCGTG ATGATATCCA AGAAGCCCAG 16680 

GCACTTTTTG AT C C TTTG AG TAGAAAATTA AATGACGGTG TAGGAAACAG TGACGATACA 16740 

GATGAAGAAT ACAGGTAAAC G AATTG AT CT GATAGCCAAT AGAAAACCGC AG AG T C AAAG 16800 

GGTTTTGTAT GAATTGCGAG ATCGTTTGAA GAG AAAT C AG TTTATACTCA ATGATACCAA 16860 

TCCGGATATT GTCATTTCCA TTGGCGGGGA TGGTATGCTC TTGTCGGCCT TTCATAAGTA 16 920 

CGAAAATCAG CTTGACAAGG TCCGCTTTAT CGGTCTTCAT ACTGGACATT TGGGCTTCTA 169 80 

TACAGATTAT CGTGATTTTG AGTTGGACAA GCTAGTGACT AATTTGCAGC T AG AT AC TGG 17 040 

GGCAAGGGTT TCTTACCCTG TTCTGAATGT GAAGGTCTTT CTTGAAAATG GTGAAGTTAA 17100 

GATTTTCAGA GC ACT CAACG AAGCCAGCAT CCGCAGGTCT GATCGAACCA TGGTGGCAGA 17160 

TATTGTAATA AATGGTGTTC CCTTTGAACG TTTTCGTGGA GACGGGCTAA CAGTTTCGAC 17220 

ACCGACTGGT AGTACTGCCT ATAACAAGTC TCTTGGCGGT GCTGTTTTAC ACCCTACCAT 172 80 

TGAAGCTTTG CAATTAACGG AAATTGCCAG CCTTAATAAT CGTGTCTATC GAACACTGGG 17340 

CTCTTCCATT ATTGTGCCTA AGAAGGATAA GATTGAACTT ATTCCAACAA GAAACGATTA 17400 

TCATACTATT TCGGTTGACA ATAGCGTTTA TTCTTTCCGT AATATTGAGC GTATTGAGTA 174 60 

TCAAATCGAC CATCATAAGA TTCACTTTGT CGCGACTCCT AGCCATACCA GTTTCTGGAA 17 520 

CCGTGTTAAG GACGCCTTTA TCGGCGAGGT GGATGAATGA GGTTTGAATT TATCGCAGAT 17 580 
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GAACATGTCA AGGTTAAGAC CTTCTTAAAA AAGCACGAGG TTTCTAAGGG ATTGCTGGCC 17 64 0 

AAGATTAAGT TTCGAGGTGG AGCTATTCTG GTCAATAATC AACCGCAAAA TGCAACGTAT 177 00 

CTATTGGACG T TGG AG ACT A CGTTACCATT GACATTCCCG CTGAGAAAGG CTTTGAAACC 177 60 

TTGGAGGCTA TTGAGCTTCC ATTAGATATT CTCTATGAGG ATGACCACTT TCTAGTCTTG 17820 

AATAAACCCT ATGGAGTGGC TTCTATTCCT AGTGTCAATC ACTCTAATAC CATTGCCAAT 17 8 80 

TTTATCAAGG GTTACTATGT CAAGCAAAAT TATGAAAATC AGCAGGTTCA CATTGTTACC 17 940 

AGACTAGATA GGGATACTTC TGGCTTGATG CTCTTTGCCA AGCACGGTTA TGCCCATGCA 18000 

C G ATTAGAC A AGCAGTTGCA GAAGAAATCT AT CG AG AAAC GCTACTTTGC TTTGGTTAAG 180 60 

GGAGATGGAC ATTTGGAGCC AGAAGGGGAA ATTATTGCTC CGATTGCGCG TGATGAAGAT 1812 0 

T C CAT T ATT A CCAGACGAGT GGCTAAAGGC GGAAAGTATG CCCATACTTC ATACAAGATT 18180 

GTAGCTTCTT ATGGAAATAT TCACTTGGTC TAT ATT C AC C TGCACACTGG TCGAACCCAT 18240 

CAAATCCGAG TCCATTTTTC T C AT AT CGGT TTTCCTTTGC TGGGAGATGA TTTGTATGGT 18300 

GGTAGTCTGG AAGATGGTAT TCAACGTCAG GCTCTGCATT GCCATTACCT ATCCTTTTAT 18360 

CATCCATTTT TAGAGCAAGA CTTGCAGTTA GAAAGTCCCT TGCCGGATGA TTTTAGTAAC 18420 

CTTATTACCC AGTTATCAAC T AAT ACT C T A TAAAAACTGT CTCAGAGTAT AATTATTATC 184 80 

TTAAAGGAGA AAACTCATGG AAGTTTTTGA AAGTCTCAAA GCCAACCTTG TTGGTAAAAA 18 540 

TGCTCGTATC GTTCTCCCTG AAGGGGAAGA GCCTCGTATT CTTCAAGCAA CAAAACGCTT 18 6 00 

AGTAAAAGAA ACAGAAGTGA TTCCTGTTTT GCTTGGAAAT CCTGAAAAAA TTAAAATTTA 18 660 

TCTTGAAATT GAAGGAATCA TGGATGGTTA TGAGGTCATC GACCCTCAAC ATTATCCTCA 1872 0 

ATTTGAAGAA ATGGTTTCTG CCTTGGTGGA GCGTCGCAAG GGCAAAATGA CTGAAGAAGA 187 80 

TGTACGCAAG GTTTTGGTTG AAGATGTCAA CTACTTTGGT GTGATGTTGG TTTACTTGGG 18 840 

CTTGGTTGAT GGAATGGTGT CAGGAGCGAT TCACTCAACA GCTTCAACAG TTCGCCCAGC 1890 0 

TCTACAAATC ATCAAAACTC GTCCAAATGT AACTCGTACT TCAGGAGCCT TCCTCATGGT 18960 

TCGTGGTACG GAACGTTACC TATTTGGAGA CTGTGCCATT AAC AT C AAT C C AGATG C AG A 19020 

AGCCTTGGCT GAAATTGCCA TCAACTCAGC AATCACAGCT AAGATGTTTG GC AT CG AAC C 19080 

TAAAATTGCC ATGTTGAGCT ATTCTACTAA AGGTTCAGGG TTTGGTGAAA GCGTTGATAA 19140 

GGTCGTTGAA GCAACTAAAA TTGCTCACGA CTTGCGTCCT GACCTTGAAA TCGATGGTGA 19200 

GTTGCAATTT GATGCAGCCT TTGTTCCTGA AACTGCAGCT CTGAAAGCTC CTGGAAGTAC 192 60 

GGTAGCTGGT CAAGCAAATG TCTTCATCTT CCCAGGTATC GAGGCAGGAA ATATTGGTTA 193 2 0 
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CAAGATGGCT GAACGCCTGG GTGGCTTTGC GGCTGTAGGA CCTGTTTTGC AAGGTTTAAA 19380 

CAAGCCAGTT AATGATCTTT CTCGTGGATG T AATG C AG AT GATGTTTACA AGTTGACCCT 19440 

CAT C AC AGC A GCTCAAGCAG TTCATCAATA GTGAAAACTA TAAAGTGATA TACTATGCTA 195 00 

TACTGTAGTT ATGAAACTAT GTACGAAAAG CACTGCCATT AATTCCTGAG AACTAAATTA 195 60 

CTGATTGGTG TCAAAAAGGA AAACTTCCAA GCGATGATAT CCTGTCTATA CACGACCTAT 19 620 

AGAAATCTGT AATATACATA TCCGTAAAAC GATAAATTCC CTTTTTGATT TTAAATGAGT 19680 

ATGAAAAGAG AATTTTTTGG CTCTTTGTCA ACTGTAGTGG GTTGAAGAAA AGCTAAGCTC 19740 

GAGAAAGGAC AAATTTCATC CTTTCTTTTT TG AT ATT C AG AGCGATAAAA ATCCGTTTTT 19 800 

TGAAGTTTTC AAAGTTCCGA AAACCAAAGG CATTGCGCTT GATAAGTTTG ATG AG AT TAT 19860 

TGGTCGCTTC CAGTTTGGCG TTAGAATAGT GTAGTTGAAG GGCGTTGATA ATCTTTTCTT 1992 0 

TATCTTTGAG GAAGGTTTTA AAGACAGTCT GAAAAATAGG ATGAACCTGC TTAAGATTGT 19980 

CCTCAATAAG TCCGAAAAAT TTCTCTGGTT CCTTATTCTG GAAGTGAAAA AGCAAGAGTT 2 0040 

GATAGAGCTG ATAGTGGTGT TTCAAGTCTT CCGAATAGCT CAAAAGCTTG TTTAAAATCT 2 0100 

CTTTATTGGT TAAGTGCATA CGAAAAATAG G AC G AT AAAA TCGCTTATCA CTCAGTTTAC 2 0160 

GGCTATCCTG TTGAATGAGT TTCCAGTAGC GCTTGATAG 20199 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19702 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7: 
ACCCGATGTA TCAGCGGATA TTTACTCTAT TTTTCAAACG ATGT T AT AC C CACAATAAAA 60 

GAAAAAAGAC CCTAAGGTCT CCTTTGCTTT T ATT ATT AAA CGCGTTCAAC TTTACCTGAT 120 

TTCAAAGCAC GAGCTGAAGC CCAAACTTTT TTAGGTTTAC CATCGATAAG AACAGTAACT 18 0 

TTTTGAAGGT TTGGTTTTAC GGCACGTTTT GTTTGGTTCA TCGCGTGTGA ACGGTTGTTT 2 40 

CCTGATACAG TCTTACGACC TGTAAAGTAA CATACTTTAG CCATTGTGTT TTCCTCCTAT 3 00 

TAGATCTAAT AT AGCGG AT G TGCTAGCACC AC AT AC C GT A C T ATGT TAT C ACATTTTCTT 3 60 

GTTTTTTGCA AGGGAATTGG AAGATTTTTT ATTTGTGTCT TAAATCAGGT CTTGCGTGAC 42 0 

ATTTcTGCTC TCCACATGCC ATCGTTGATT AACAGAACAC CAGAATTAAA ATTATGTGTA 480 

T AAAAAT CAT CTCTAACTGC AGCTAAGGGT ATAGCCGTCA AGTCCAAATC CCACAGCTCA 54 0 
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TCTATCGATT 


TTCTTACAAC 


AATATCTGAA 


TCCAAATACA 


GTACACGAGA 


CTCGCTTACA 


600 


TACTTTGGAA 


TAAAATACCT 


AAAAAAGCCG 


CATATGAAAG 


TCCCTCAAAG 


GGGAGACGAT 


660 


AACCTTTCAG 


AATATTACTG 


TCAATCTAAA 


CAT T C AC AAT 


CTCACTATTC 


AAAGTCTCTA 


720 


GTCTTTTTTC 


CATCAATTGG 


AACCATTCTC 


GCGGAAGGTC 


ATCATTAAAA 


ACATAAAACT 


780 


TAAGATTATA 


ATGATGAACA 


CAAAGAGATT 


TTATTGTTGT 


TTCAACTTTA 


TCCATATAAG 


840 


CATTATCTGC 


ACCTAAGACA 


ATCGCTTTTT 


TCTCTTCTTT 


CACTTTTTAT 


CTCATTTCTT 


900 


TTTATTCCCA 


TCATATTATT 


CCCATCATAT 


GTTTCCCATC 


ATATGTTTCT 


ACGTAACCAT 


960 


TATTTTCGCC 


TATTCGTTCG 


TAAAACCATA 


CCAGTGGAGA 


TTTTAGATGA 


AGTCCCATTA 


1020 


CGGTTTACAA 


TTTTTACATT 


ACGACACGGA 


GTTTTACAAA 


TCGATTTCAT 


TTGCCAAACG 


1080 


TAGTTAGTGA 


GGCAGTTAGC 


TAGTTCGCCA 


AAT AGC G ACT 


AGCGTCCAAC 


AATTTGGAAC 


1140 


TTTAGTTCCA 


ATTGTTGGTA 


CTGAGTCACA 


TCTTCTCCTC 


TAACTCTACG 


TCTGGATACT 


1200 


TGTCCGCAAA 


CCAGCGGAGG 


GCAAAGTCAT 


TTTCAAAGAG 


AAAGACTGGT 


TGGTCAAAAC 


1260 


GGTCTTTGGC 


TAAGATATTG 


CGACTTGACG 


ACATCCGTTC 


ATCCAAGTCC 


TCAGGCTTGA 


1320 


TCCAACGAAC 


GGTCTTTTTA 


CCCATTGGGT 


TCATAACTAC 


TTCCGCATTG 


TACTCGCCTT 


1380 


CCATGCGGTG 


TTTAAAGACT 


TCAAACTGGA 


GTTGACCTAC 


AGCGCCTAGC 


ATGTACTCAC 


1440 


CTGTTTGGTA 


ATTCTTATAA 


AGCTGAACGG 


CTCCTTCTTG 


CACCAATTGC 


TCAATCCCCT 


1500 


TGTGGAAGGA 


TTTTTGCTTC 


ATAACATTCT 


TAGCAGAAAC 


TTTCATGAAA 


ATCTCAGGTG 


1560 


TAAAGGTTGG 


CAGGGGTTCA 


AATTCAAACT 


TGTTTTTTCC 


AACCGTCAAG 


GTATCCCCAA 


1620 


CCTGATAAGT 


ACCGGTATCG 


TAAACCCCGA 


T AAT AT C AC C 


TGCCACGGCA 


TTGGTCACAT 


1680 


TCTCACGACT 


CTCCGCCATA 


AACTGGGTAA 


CATTAGATAG 


TTTAGCCCCC 


TTACCAGTAC 


1740 


GAGGGAGATT 


GACACTCATG 


CCGCGCTCAA 


ATTCGCCAGA 


TACGATACGG 


ACAAAGGCAA 


1800 


TACGGTCACG 


GTGACGAGGG 


TCCATGTTGG 


CTTGGATTTT 


AAAGACAAAG 


CCTGAGAAAT 


1860 


C C TTG T CAT A 


AGGATCCACA 


ATTTCACCGT 


CTGTTTTCTT 


GTGACCATGT 


GGTTCTGGAG 


1920 


CAAACTTGAG 


GAAGGTTTCA 


AGGAAGGTCT 


GCACACCAAA 


GTTTGTCAGG 


GCTGAACCGA 


1980 


AAAAGACAGG 


CGTCAATTCT 


CCAGCCAGAA 


TAGCTTCCTC 


TGAAAACTCA 


TTCCCGGCTT 


2040 


CATTTAAAAG 


CTCAATGTCA 


TCCTTGACTT 


GCTCGTAGAA 


AGGATTGCTA 


CCAAAGAGTT 


2100 


TGTCCCCGTC 


TTCTAGACTG 


GCAAAACGCT 


CATCCCCTTT 


GTAAAGCTCT 


AAACGTTGGT 


2160 


TATAGAGGTC 


ATACAAGCCC 


TCAAAGGCTT 


TCCCCATCCC 


GATAGGCCAG 


TTCATAGGGT 


2220 


AGCTAGCAAT 


GCCCAAGATT 


TCTTCCAATT 


CTTGCAAGAG 


ATCCAAAGGC 


TCACGACCGT 


2280 
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CACGGTCCAG 


CTTGTTCATA 


AAGGTAAAGA 


CTGGAATGCC 


ACGATGTTTC 


ACAACCTCAA 


2340 


ACAATTTCTT 


GGTTTGAGCC 


TCGATCCCCT 


TGGCAGAGTC 


CACGACCATG 


ACCGCAGCAT 


2400 


CCACCGCCAT 


CAAGGTACGA 


TAGGTATCTT 


CTGAGAAGTC 


CTCGTGCCCT 


GGCGTGTCTA 


2460 


AGATATTCAC 


GCGCTTGCCG 


TCGTAGTCAA 


ATTGCATAAC 


AGATGAAGTA 


ACAGAAATCC 


2520 


CACGTTGCTT 


CTCGATATCC 


ATCCAGTCAG 


ATTTAGCAAA 


AGTCCCTGTT 


TTCTTCCCTT 


2580 


TTACCGTACC 


AGCCTCACGA 


ATCTCACCCC 


CAAAGTAGAG 


TAACTGCTCA 


GTGATGGTTG 


2640 


TTTTCCCCGC 


GTCCGGGTGG 


GAGATAATGG 


CAAAGGTACG 


ACGTTTCTTA 


ATTTCTTCTT 


2700 


GAATATTCAT 


AAGTTCTCTT 


TCTTTGATTC 


TCTATTTTTC 


TTGTTTCAAT 


AGCTGAGAAT 


2760 


GATTTTTACA 


TTGGATTTTA 


CCATTCCTTT 


CAACACTCCA 


T T AT AT CGG A 


T TTT AG CAT T 


2820 


TTTTTCAATT 


TCTATTTCTT 


TTCACTTCCC 


CCTCCCTTAT 


TTATAGGAAA 


AT AT GGT AAA 


2880 


ATAGAACAGA 


CTAAAAATCA 


TCATTTCACG 


AAAGGATGCA 


AGATGAAAAT 


TACGCAAGAA 


2940 


GAGGTAACAC 


ACGTTGCCAA 


TCTTTCAAAA 


TTAAGATTCT 


CTGAAGAAGA 


AACTGCTGCC 


3000 


TTTGCGACCA 


CCTTGTCTAA 


GATTGTTGAC 


ATGGTTGAAT 


TGCTGGGCGA 


AGTTGACACA 


3060 


ACTGGTGTCG 


CACCTACTAC 


G ACT ATGGC T 


GACCGCAAGA 


CTGTACTCCG 


CCCTGATGTG 


3120 


GCCGAAGAAG 


GAATAGACCG 


TGATCGCTTG 


TTTAAAAACG 


TACCTGAAAA 


AG AC AAC T AC 


3180 


TATATCAAGG 


TGCCAGCTAT 


CCTAGACAAT 


GGAGGAGATG 


CCTAATGACT 


TTTAACAATA 


3240 


AAACT AT TG A 


AG AGTTG C AC 


AATCTCCTTG 


TCTCTAAGGA 


AATTTCTGCA 


ACAGAATTGA 


3300 


CCCAAGCAAC 


AC T TG AAAAT 


ATCAAGTCTC 


GTGAGGAAGC 


CCTCAATTCA 


TTTGTCACCA 


3360 


TCGCTGAGGA 


GCAAGCTCTT 


GTTCAAGCTA 


AAGCCATTGA 


TGAAGCTGGA 


ATTGATGCTG 


3420 


ACAATGTCCT 


TTCAGGAATT 


CCACTTGCTG 


TTAAGGATAA 


CATCTCTACA 


GACGGTATTC 


3480 


TCACAACTGC 


TGCCTCAAAA 


ATGCTCTACA 


ACTATGAGCC 


AATCTTTGAT 


GCGACAGCTG 


3540 


TTGCCAATGC 


AAAAACCAAG 


GGCATGATTG 


TCGTTGGAAA 


GACCAACATG 


GACGAATTTG 


3600 


CTATGGGTGG 


TTCAGGTGAA 


ACTTCACACT 


ACGGAGCAAC 


TAAAAACGCT 


TGGAACCACA 


3660 


GCAAGGTTCC 


T GGTGGGT C A 


TCAAGTGGTT 


CTGCCGCAGC 


TGTAGCCTCA 


GG AC AAGTT C 


3720 


GCTTGTCACT 


TGGTTCTGAT 


ACTGGTGGTT 


CCATCCGCCA 


ACCTGCTGCC 


TTCAACGGAA 


3780 


TCGTTGGTCT 


CAAACCAACC 


TACGGAACAG 


TTTCACGTTT 


CGGTCTCATT 


GCCTTTGGTA 


3840 


GCTCATTAGA 


CCAGATTGGA 


CCTTTTGCTC 


CTACTGTTAA 


GG AAAATGC C 


CTCTTGCTCA 


3900 


ACGCTATTGC 


CAGCGAAGAT 


GCTAAAGACT 


CTACTTCTGC 


TCCTGTCCGC 


ATCGCCGACT 


3960 


T T ACT TC AAA 


AATCGGCCAA 


GACATCAAGG 


GTATGAAAAT 


CGCTTTGCCT 


AAGGAATACC 


4020 


TAGGCGAAGG 


AATTGATCCA 


GAGGTTAAGG 


AAACAATCTT 


AAACGCGGCC 


AAACACTTTG 


4080 
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AAAAATTGGG TGCTATCGTC GAAGAAGTCA GCCTTCCTCA CTCTAAATAC GGTGTTGCCG 414 0 

T T T ATT AC AT CATCGCTTCA TCAGAAGCTT CATCAAACTT GCAACGCTTC GACGGTATCC 42 00 

GTTACGGCTA TCGCGCAGAA GATGCAACCA ACCTTGATGA AATCTATGTA AACAGCCGAA 42 60 

GCCAAGGTTT TGGTGAAGAG GTAAAACGTC GTATCATGCT GGGTACTTTC AGTCTTTCAT 432 0 

CAGGTTACTA TGATGCCTAC TACAAAAAGG CTGGTCAAGT CCGTACCCTC ATCATTCAAG 4 3 80 

ATTTCGAAAA AGTCTTCGCG GATTACGATT TGATTTTGGG TCCAACTGCT CCAAGTGTTG 444 0 

CCTATGACTT GGATTCTCTC AACCATGACC CAGTTGCCAT GTACTTAGCC GACCTATTGA 4 50 0 

CCATACCTGT AAACTTGGCA GGACTGCCTG GAATTTCGAT TCCTGCTGGA TTCTCTCAAG 4560 

GTCTACCTGT CGGACTCCAA TTGATTGGTC CCAAGTACTC TGAGGAAACC ATTTACCAAG 4 62 0 

CTGCTGCTGC TTTTGAAGCA ACAACAGACT ACCACAAACA ACAACCCGTG ATTT TTGG AG 4 680 

GTGACAACTA ATGAACTTTG AAACAGTCAT CGGACTTGAA GTCCACGTAG AGCTCAACAC 474 0 

CAATTCAAAA ATCTTCTCAC CTACTTCTGC CCACTTTGGA AATGACCAAA ATGCCAACAC 4 800 

TAACGTGATT GACTGGTCTT TCCCAGGAGT TCTACCAGTT CTCAATAAAG GGGTTGTTGA 48 60 

TGCCGGTATC AAGGCTGCTC TTGCCCTCAA C ATGG AC AT C CACAAAAAGA TGCACTTTGA 4 92 0 

CCGCAAGAAC TACTTCTATC CTGATAACCC CAAAGCCTAC CAAATTTCTC AGTTTGATGA 4 9 80 

ACCAATCGGA TATAATGGCT GGATTGAAGT CAAACTAGAA GACGGTACGA CCAAGAAAAT 5 04 0 

CGGTATCGAA CGTGCCCACC TAGAGGAAGA CGCTGGTAAA AACACCCATG GTACAGATGG 5100 

CTACTCTTAT GTTGACCTCA ACCGCCAAGG GGTTCCCTTG ATTGAGATTG TATCTGAGGC 5160 

AGATATGCGT TCTCCTGAAG AAGCCTATGC TTATCTGACA GCCCTCAAGG AAGTTATCCA 522 0 

GTACGCTGGC ATTTCTGACG TTAAGATGGA GGAAGGTTCG ATGCGTGTGG ATGCCAACAT 52 8 0 

CTCCCTTCGT CCTTATGGTC AAGAGAAATT CGGTACCAAG AC TG AATTG A AGAACCTCAA 534 0 

CTCCTTCTCA AACGTTCGTA AAGGTCTTGA ATACGAAGTC CAACGCCAGG CTGAAATTCT 54 00 

TCGCTCAGGT GGTCAAATCC GCCAAGAAAC ACGCCGTTAC GATGAAGCGA ATAAAGCAAC 54 6 0 

CATCCTCATG CGTGTCAAGG AAGGGGCTGC TGACTACCGC TACTTCCCAG AACCAGACCT 552 0 

ACCCCTCTTT GAAATTTCTG ACGAGTGGAT TGAGGAAATG CGGACTGAGT TGCCAGAGTT 55 80 

TCCAAAAGAA CGTCGTGCGC GTTATGTATC TGACCTTGGT TTATCAGACT ACGATGCTAG 5640 

TCAGTTGACT GCTAATAAAG TCACTTCTGA CTTCTTTGAA AAAGCTGTTG CCCTAGGTGG 57 00 

TGATGCCAAA CAAGTCTCTA ACTGGCTCCA AGGGGAAGTC GCTCAGTTCT TGAATGCTGA 57 6 0 

AGGTAAAACA CTGGAACAAA TCGAAT TGAC ACCAGAAAAC TTGGTTGAAA TGATTGCCAT 5 82 0 
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CAT CG AAG AC GGTACTATTT CATCTAAGAT TGCCAAGAAA GTCTTTGTCC ATCTAGCTAA 5880 

AAATGGCGGT GGCGCGCGTG AATACGTGGA AAAAGCAGGT ATGGTTCAAA TTTCAGATCC 5940 

AGCTATCTTG ATCCCAATCA TCCACCAAGT CTTTGCCGAT AACGAAGCTG CTGTTGCCGA 6000 

CTTCAAGTCA GGCAAACGTA ACGCCGACAA GGCtTTACAG GATTCCTTAT GAAGGCAACC 6060 

AAAGGCCAAG CCAACCCACA AGTTGCCCTT AAACTACTTG CACAGGAATT GGCGAAGTTG 612 0 

AAAGAAAACT AGACAGAACA AAACCAGCCC TAAGGTTGGT TTTTTCTTCT CTACCAACTC 6180 

CCAATAACTA TTTTGGCTTT ATTTCCAGAG TATTTTATGG TAAAATGAAG AGTAATAATA 6240 

TTTATTAAAG AGGTAAAAAC ATGATTGAAG CAAGTACCTT AAAAGCTGGT ATGACCTTTG 63 00 

AAACAGCTGA CGGCAAATTG ATTCGCGTTT TGGAAGCTAG TCACCACAAA CCAGGTAAAG 63 60 

GAAACACGAT CATGCGTATG AAATTGCGTG ATGTCCGTAC TGGTTCTACA TTTGACACAA 6420 

GCTACCGTCC AGAGGAAAAA TTTGAACAAG CT AT TAT C G A GACTGTCCCA GCTCAATACT 6480 

TGTACAAAAT GGATGACACA GCATACTTCA TGAATACAGA AACTTATGAC CAATACGAAA 654 0 

TCCCTGTAGT CAATGTTGAA AACGAATTGC TTTACATCCT TGAAAACTCT GATGTGAAAA 6 600 

TCCAATTCTA CGGAACTGAA GTGATCGGTG TCACCGTTCC T AC T ACT G T T GAGTTGACAG 6660 

TTGCTGAAAC TCAACCATCT AT C AAAGGTG CTACTGTTAC AGGTTCTGGT AAACCAGCAA 672 0 

CGATGGAAAC TGGACTTGTC GTAAACGTTC CAGACTTCAT CGAAGCAGGA CAAAAACTCG 6780 

TTATCAACAC TGCAGAAGGA ACTTACGTTT CTCGTGCCTA AT C T CT AG AA AGAGGTCATT 6840 

CTATGGGAAT TGAAGAACAA CTTGGCGAAA TCGTTATCGC CCCACGTGTA CTTGAAAAAA 69 00 

TCATTGCTAT CGCTACTGCA AAGGTAGAGG GTGTTCACTC TTTTTCAAAC AGATCAGTGT 6960 

CTGATACCCT TTCAAAACTT TCACTCGGCC GTGGCATTTA TCTTAAAAAC GTGGACGAAG 7 02 0 

AACTCACAGC AGATATCTAT CTCTACCTTG AGTACGGAGT AAAAGTTCCT AAGGTAGCGG 7080 

TTGCTATCCA GAAAGCTGTC AAAGATGCCG TCCGTAATAT GGCTGATGTA GAACTCGCTG 7140 

CTATCAATAT TCACGTTGCA GGTATCGTCC CAGATAAAAC ACCAAAACCA GAATTGAA\G 72 00 

ATCTATTTGA CGAGGACTTC CTCAATGACT AGTCCACTAT TAGAATCTAG ACGCCAACTC 72 60 

CGTAAATGCG CTTTTCAAGC TCTCATGAGC CTTGAGTTCG GTACGGATGT CGAAACTGCT 7320 

TGTCGTTTCG CCTATACTCA TGATCGTGAA GATACGGATG TACAACTTCC AGCCTTTTTG 7380 

ATAGACCTCG TTTCTGGTGT TCAAGCTAAA AAGGAAGAAC TAG AT AAG C A AATCACTCAG 7440 

CATTTAAAAG CAGGTTGGAC CATTGAACGC TTAACGCTCG TGGAGAGAAA CCTCCTTCGC 7 500 

TTGGGAGTCT TTGAAATCAC TTCATTTGAC ACTCCTCAGC TGGTTGCTGT TAATGAAGCT 7 560 

ATCGAGCTTG CAAAGGACTT CTCCGATCAA AAATCTGCCC GTTTTATCAA TGGACTGCTC 7620 
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AGCCAGTTTG TAACAGAAGA ACAATAAGGC TCTTTGTCAA CTGTAGTGGG TTGAAAAAAA 7 680 

GCTAAGCTCG AGAAAGGACA AATTTCGTCC TTTCTTTTTT G ATGT T C AAA GCGATAAAAA 774 0 

TCCGTTTTTT GAAGTTTTCA AAGTTTCGAA AACCAAAGGC ATTGCGCTTG ATAAGTTTGA 7800 

TGAGATTATT GGTCGCTTCC AGTTTGGCAT TAGAATAGTG TAGTTGAAGG GCGTTGACAA 7 860 

TCTTTTCTTT ATCTTTGAGG AAGGTTTTAA AGACAGTCTG AAAAATAGGA TGAGCCTGCT 7 92 0 

TAAGATTGTC CTCAATAAGT CCGAAAAATT TCTCTGGTTC CTTATTCTGG AAGTGAAACA 7980 

GCAAGAGCTG ATAGAGCTGA TAGTGGTGTT TCAAGTCTTG TGAATGGCTC AAAAGCTTGT 8040 

CTAAAATCTC TTTATTGGTT AAGTGCATAC GAAAAGTAGG AC G AT AAAAT CGCTTATCAC 8100 

TCAGTCTACG GCTATCCTGT TGAATGAGTT TCCAGTAGCG CT TG AT AT C C TTGTATTCAT 8160 

GGGATTTTCG ATGAAACTGA TTCATGATTT GGACACGCAC ACGACTCATG GCACGGCTAA 8220 

GATGTTGTAC AATGTGAAAG CGATCAAGAA CGATTTTAGC ATTCGGGAGT GAAACAGTCT 8280 

GGGAGACTGT TTCAGCCTGA GCCTAGGAAT TTGAAAGCGA AGCTGTTTAG CCAAGTCATA 83 40 

GTAAGGGCTA AACATATCCA TAGTAATAAT TTTGACGCGA CAT CGG AC AA CTCTATCGTA 8400 

GCGAAGAAAG TGATTTCGAA TGATAGCTTG TGTTCTACCC TCAAGAACAG TGATGATATT 84 60 

GAGATTGTTA AAATCTTGCG CAATGAAGCT CATCTTTCCC TTTGTAAAAG CATACTCATC 852 0 

CCAAGACATA ATCTCAGGAA GACAAGAAAA AT C ATGTT T A AAGTGAAAAT CATTGAGCTT 8580 

ACGAATAACA GTTGAAGTTG AGATGGAAAG CTGATGGGCA AT AT C AGT C A TAGAAATCTT 8640 

TTCAATCAAC TTTTGAGCAA TCTTTTGGTT GATGATACGA GGGATTTGGT GATTTTTCTT 8700 

GACGATAGAA GTTTCAGCGA CCATCATTTT TG AAC AGT G A TAGCACTTGA ATCGACGCTT 87 60 

T CT AAGGAG A ATTCTAGTAG GCATACCAGT CGTTTCAAGA TAAGGAATTT TAGAAGGTTT 882 0 

TTGAAAGTCA TATTTCTTCA ATTGGTTTCC GCACTCAGGG CAAGATGGGG CGTCGTAGTC 8880 

CAGTTTGGCG ATGATTTCCT TGTGTGTATC CTTATTGATG ATGTCTAAAA TCTGGATATT 894 0 

AGGGTCTTTA ATGTCTAGTA ATTTTGTGAT AAAATGTAAT T GT T CC AT AT GAATCTTTCT 9 000 

AATGAGTTGT TTTGTCGCTT T TC ATT AT AG GTCATATGGG ACTTTTTTTC TACAATAAAA 906 0 

TAGGCTCCAT AATATCTATA GGGG AT T T AC CCACTACAAA TATTATAGAG CCAACAATAA 9120 

AAAGAAAAAG TGT TTG AT AG ATATCAAACA CTTTTTTCTT TGCCTCCCAC TATCTAAAAA 9180 

AATGATAATA GATATAATTG TAAACAAAAA TCCAGATAGG TTTTGCATGA TTGAGAAAGT 9240 

TAAAAAAACT ATGGCAGAGA ATCGTTAATC TCAGATTGTC GGTAGAACGA TAAACAAGGG 9 3 00 

CAAAAAAGAA ACCAATCAGA CTATAATATA ATAAACTAAT TGGATCTCTG TGAGATAGTA 93 6 0 
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TCAAATGGCT 


AATCCCAAAG 


AT GAT AGC AG 


ATAGGATAAC 


AT C C AAAT AG 


TACTTGGACT 


9420 


AGGGAAAGAA 


GGTATTCATA 


AAATACCCTC 


TATCAAGAGT 


CTCCTCAAAA 


AC AGG AC C G A 


9480 


TGATTACAGG 


CAGGACAAAA 


GATAAGATAG 


TCGATAAAAA 


GGTTGGTTGT 


CCATTTGAAA 


9540 


AAAGCACGGT 


AAAATACTCA 


TCATGAATAT 


TCCTATGATT 


AATCAAATGA 


GCATAGCGTG 


9600 


CCCAAAAATT 


ACCGAGAATC 


TGATAAACCA 


CATAAGTTGC 


AAATAAGTAG 


AAGACAAATG 


9660 


ACCAGTTCCA 


GCTCTTTTTC 


TCAAAGATAA 


AGAGCATCTT 


TTTCTTTTTT 


AACCTCCAAA 


9720 


TTAATAGAAG 


G AAACTTC C C 


ACTAATCCCA 


TTGTTAAAAT 


AAGAGAATAG 


ACATCAGCTC 


9780 


CTAACCCTAA 


AATGATCGTC 


ACATACAATC 


CAATTGTTTG 


TGGTAAATAG 


GTAGATAGTA 


9840 


AAATAATAAG 


CAAAAATATT 


CCAAATTGTC 


TTAGTTTTTT 


TGTGTTTCTC 


ATCGTACTTT 


9900 


TTTGAAAGAT 


TACCCTGCTC 


GGAAGCCGTA 


CTTCCAAGCA 


TCTATATAAG 


AATTAAGTGC 


9960 


CCCTTGCCTC 


ATATAGGGAG 


CAAATTCTCT 


ATAATATAAC 


CATCTACTAT 


ATCCATCTTC 


10020 


CCAAACAGCA 


AGACCACCTG 


AAGTTTGCTC 


CAAGTCCTCA 


GTTGAAAGAA 


CTGTAAATGT 


10080 


ATTTGTACCT 


GTCATTGCAA 


GTACCTTCTT 


AAAATAGATT 


GTTGTAGGCT 


CACATTTATA 


10140 


GTATATTTCT 


TTTTTTGTCT 


ATTTTATAGC 


CCATCTCCTC 


AACTGGCAAT 


TTTTCGACCT 


10200 


GAATTACATT 


TTTCCATAAA 


AAATGAGACC 


TTTCTAGTCT 


CATTTAGTCA 


TTCTTAGTAT 


10260 


TTTCTAAATC 


GTTGATAGCG 


TTCTTCCAGC 


AACTCTTCTA 


GCGGTTTTTG 


TGAAAGTCTA 


10320 


GCCAGCTCCG 


TTTGGAGTTC 


TTTTTTGACA 


CTCTTAATCA 


GTTCTTTACT 


AGAAAGTCCT 


10380 


ATT T C AG AAA 


TCACCTTATC 


CACCACGTCC 


ATTTCTAACA 


GTTCATGCGA 


AGTGATTTTC 


10440 


ATCAGTTCTG 


CTGCTTCCAT 


AGCGCGAGTA 


CCGTCCTTCC 


ATAAAATGGA 


AGCAAAGCCT 


10500 


TCTGGACTGA 


G AATGG C AT A 


GATAGAATTT 


TCCAGCATCC 


AG AC ACGGT C 


CGCGACAGCT 


10560 


AG AGC C AG AG 


CCCCGCCTGA 


ACCACCTTCA 


CCGATAATAA 


TGGCGATAAT 


AGGAACTTTC 


10620 


AGGTCACTCA 


TTTCCATGAG 


ATTGCGAGCG 


ATAGCTTCCC 


CTTGACCACG 


TTCTTCCGCT 


10680 


CCGACACCAG 


GATAAGCACC 


TGCTGTATTG 


ATAAAGGTCA 


CAACTGGACG 


GCCAAATTTC 


10740 


TCAGCCTGTT 


TCATCAACCG 


CAGTGCCTTT 


CGGTAGCCTT 


CTGGATGTGG 


TTGGCCAAAA 


10800 


TTCCGTTTGA 


GGTTGTCTTG 


CAAACTCTTG 


CCTTTTTGGA 


TACCAACCAC 


TGTTACAGCT 


10860 


TGGTCTCCAA 


GCCAACCAAT 


ACCACCAACA 


ACTGCACCAT 


CAT C ACGAAA 


AGAACGGTCA 


10920 


CCATGTAATT 


GGATAAATTC 


AT C AAAAATG 


CCTGTCGCAA 


AGTCCAAGGT 


TGTCAAGCGA 


10980 


CTCTGCTCAC 


GCGCTTCTCT 


GACTATTTTT 


GCAATATTCA 


TCTAGGACTC 


CCTCCATGCA 


11040 


ATCTGACTAG 


GCTAGCAATC 


GTATCTGGTA 


AGTCTCTTCT 


TTTGACAATA 


GCATCCACAA 


11100 


AGCCATGTTC 


TAATAGGAAT 


TCTGCCTTTT 


GGAAATCCTC 


AGGCAAGCTT 


TCACGAACCG 


11160 
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TATTTTCAAT CACACGACGC CCAGCAAAAC CAACCAAGCT CTGTGGTTCA GCCAGAATGA 11220 

TATCGCCTTC CATAGCGAAA GAAGCTGTCA CACCACCAGT CGTTGGATCT GTCAAAATGG 112 80 

TCAGGTAAAA GAG AC C AGC A TTTGAATGGC GTTTAACCGC CGCAGAGATC TT AG C C AT CT 11340 

GCATGAGACT CATGATTCCT TCCTGCATAC GGGCTCCACC AGAGGCTGTG AATAGGACAA 11400 

CTGGCAATTT TTCGACAGTC GCATACTCAA ACAAACGAGT GATTTTTTCA CCTACAACCG 114 60 

TACCCATAGA AGCCATGATA AAGTTAGAAT CC AT AATCCC AAG AG C C AC A GTCTGACCTT 11520 

TAATAAGAGC AGTTCCTGTC ACAACGGCTT CATGCAGACC TGTTTTTTCA CGCATAGATG 11580 

CCAGTTTCTT TTGGTAACCA GGGAAATGCA AGGGATCCTT GCTTTCAATC CCTGTAAACA 11640 

ATTCTTTGAA GGTTC C CAT A TCAATCGTCA AAGCCAAGCG TTCTTGGGCA GAAATACGAA 11700 

AGGTATAGCT ACAGTGCGGA CAGATACGTT CACTTCCCAG ATCCTTCTGA TAGATGGTAT 117 60 

GCTTACAGCC TGGACACTGG GAAAATAATT CATCTGGAAC CTCTGGCTTA GCTTGAGGTT 11820 

TTTCCCTAAC CGAACGATTG GGATTGATTC GAATATACTT ATCTTTTTTA C T AAAT AG AG 118 80 

CCATTGATTC CCCTTTTCGG TTTAAACTCT TAAAGTCATT TTATTCTTTT TCTTGATATT 11940 

TAGGTAAGAA GGTTTCCATC AAGAAGGAAG TATCATAATC CCCAGCAATG ACATTGCGAT 12 000 

CTGAAATGAG GTCAAGCTGG AAATCTGCAT TGGTCTGCAC TCCTTCAATT TCTAATTCAT 12 060 

AGAGGGCACG TTGCATTTTC ATCAAGGCGT CAAAACGATT TTCGCCGTGT ACTATGATTT 1212 0 

TGGC AAT CAT ACT AT C AT AA TAAGGCGGAA TGGTATAACC TGGATAAACT GCTGAATCCA 12180 

CGCGCAAGCC AACTCCACCA CTTGGCAGAT AGAGATTAGT AATCTTACCT GGACTTGGAG 12 240 

CAAAGTTAAA GGCTGGGTTT TCTGCATTGA TACGACACTC GATGGCATGA CCGCGTAGGA 12 3 00 

CAATATCTTC TTGCTTAACA GACAAAGGCT GACCTGCCGC AATGCAAATC TGTTCCTTAA 12 3 60 

CGATATCAAC ACCTGAAACA AACTCTGTTA CTGGATGTTC TACCTGAACA CGAGTATTCA 12 42 0 

TCTCCATGAA ATAGAAATTG CTACTTGCTT CAT C AAG AAG AAATTCAATG GTTCCTGCAT 124 80 

TCTCATAGCC AACAAACTCT GCCGCTCGAA C AG C AG C AGC ACCTATTTCA TGACGCAGCG 12 540 

TTTTTCCGAT TGCAATCGAG GGACTTTCTT CCAAAACCTT TTGGTTATTC CTTTGAAGAG 12 600 

AACAATCCCG TTCACCCAAG TGAATCACAT GTCCATGCTC AT C AC CT AGG ATTTGAACCT 12 6 60 

CAATGTGCCG AGCTGGATAG AT AAC CCGTT CTATGTACAT GGCACCATTG CCATAATTGG 12720 

CCTTGGCCTC ACTAGAGGCA GTTTCAAAGG CAGAAACGAG GTCATCTGGT TTTTCAACCT 127 80 

TACGAATCCC TTTACCACCT CCACCTGCTG AAGCCTTGAG CATAACAGGA TAGCCAATTT 12 840 

TTTCAGCAAC AATCAAAGCT TCTTCAGAGT TATGCACTTC TCCATCTGAA CCTGGTATAA 12 900 



WO 98/18931 



PCT/US97/19588 



202 

CAGGCACACC TGCTTTAATC ATCTGAGCAC GCGCATTGAT CTTATCCCCC ATCATATCCA 12 9 60 

TAACATGACC AGATGGACCG ATAAACTTGA TACCTACTTC TTCACACATG GTCGCAAATT 13020 

TGGAATTTTC ACTGAGAAAT CCAAAACCAG GGTGAATAGC TTCTGCCTCA GTCAAGACTG 13 080 

CAGCTGATAG AACTGCATTA ATATTGAGAT AAGACTCTGT TGCCTTGCCA GGACCAATAC 13140 
AAACTGCTTC ATCTGCCAAA AGCGTATGAA GAGCTTCCTT ATCAGCAGTT GAATAAACCG ' 13200 

CTACCGTCGC AATCCCCAAT TCACGTGCCG CACGGATAAT ACGAACCGCA ATTTCACCAC 132 60 

GATTGGCAAT TAAAATTTTT CGAAACATGG AGAACCTCCT TAGTTCCCAA TTGCAAAAGT 13320 

AAGGGTACCA CTGGCTGCAA GCTTGCCATC CACTTCAGCC TTTGCTTCAA C C AC AG C TAT 133 80 

GGTGCCACGA CGTTTTACAA AAGTCGCTGT CAT AAC C AAT TGGTCGCCTG GTACAACTTG 13440 

CTTCTTGAAC TTAACCTTGT CCATACCAGC GTAAAAGACC AGTTTTCCTT TATTTTCAGG 13 500 

TTTTGATAAC TCCAACACAC CGGCAGTTTG CGCCAAGGCT TCCATAATCA CAACACCTGG 13 5 60 

CATAACTGGG TATTGAGGAA AGTGGCCGTT AAAGAAAGGC TCGTTGATGG TCACATTTTT 13 62 0 

GATAGCAACA ATGGTATCCT CGCTCACTTC CAAGACACGG TCCACTAGAA GCATAGGATA 13 680 

ACGGTGGGGA AGAGCTTCTT TGATTCCTTG AATATCGATC ATTTGATACG TACCAATCCT 13740 

TTACCAAACT CAACCATTTC TTCGTTAGAG AC G AG AATTT CCGTTACCAC ACCATCCTTA 13800 

GGAGCTGGGA TTTCATTCAT GACTTTCATG GCTTCGATAA T T AC C AAT GT TTGACCTTTT 13860 

TTGACACTAT CACCAACTGT AACGAAGGCA GGTTTATCTG GTCCAGCAGC CAAGTAAACC 13 920 

ACTCCAACAA GTGGACTCTC TACAAGATTT CCCTCAGTAG CCACACTTGC TTCAGCTGGA 13 9 80 

GCTGGAACTT CTTCTGCTAC AGTCTCTGCT GG AG C AG AT G TAGGAGCTAC TGGACTCGGT 14040 

GTTGCTAGAA CGGGTGCTGG AGCGACTTGA GTTGCAACTT CAGGCACAGG TCTTGCTTCA 14100 

TTCTTGCTAA ACTGCAACTC ATCCGTCCCA TTTTTATAAG AAAATTCTCT CAAACTTGAC 14160 

TGGTCAAATT GAGTCATCAA GTCTTTAATA TCGTTTAAAT T CAT ACT TAT CTATTCTCCC 142 20 

AACGTTTGAA AGCAAGAACT GCATTGTGGC CTCCAAAACC AAAAGTATTT GAAATAGCGT 142 80 

ATGGAATTTC TTTCTCCAAG CCTTGTCCAT AAACGACATT AGCTTCGATA TAATCTGATA 14340 

CTTCACTTGT CCCAGCTGTC AT TGGT AC AA AGTTATGACG CATAGCTTCG ATGGTGACGA 14400 

TAGCTTCTAC TGCACCCGCA GCCCCCAGCA AATGTCCTGT AAAAGACTTG GTTGATGATA 144 60 

CAGGTACTTC CTTACCAAGA ACAGCTACGA TAGCACCACT TTCTCCTTTT TCATTGGCAG 14 520 

GAGTTGACGT TCCGTGAGCA TTGACATAGG CTACTTGCTC TGGAGAAATC TCAGCTTCTT 14580 

CCAAGGCTAG TTTGATGGCC TTGATAGCTC CCTGACCTTC TGGATGTGGA GAAGTCATGT 14640 

GGTAGGCATC ACAAGTATTT CCGTAACCAA CCACTTCAGC CAGGATAGTA GCTCCACGTT 147 00 
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TTTCAGCGTG TTCAAGACTT TCTAGAACCA AC AT CC C T G A ACCTTCACCC ATAACAAACC 147 60 

CATTGCGATC CTTATCAAAT GGGATCGAAG CACGAGTTGG ATCCTCTGTA GTAGAGAGAG 14 82 0 

CTGTTAAGGC TTGGAAACCA GCGATGGCAA AAGGTGTGAT AGAAGCTTCT GTTCCTCCCA 148 80 

CCAACATCAC ATCTTGGAAA CCAAACTTAA TGGAGCGGAA GGCATCCCCA ATCGCATCAT 1494 0 

TTGATGAAGA GCAGGCAGTA TTGATAGATT T AC AAAC AC C GTTTGCACCA AAACGCATGG 15000 

CTACATTCCC AG AAGC C AT A TTTGGTAAAG CTTTTGGAAG AGTCATTGGT TTGACACGTT 15060 

TGGGTCCTTT TTCATGAAGG CGAAGTACCT GATCTTCAAT TTCCTTGATT CCACCAATAC 1512 0 

CAGATGCAAC GATAACACCA AAACGATCCC TATTAAGAGC CTCTACATCA AGATTGGCAT 15180 

G ATT T AC AG C CTCTTGGGCT GCATACAAGG CATATAAAGA AT AG TT AT C A AAACGGTTGG 15240 

TATCTTTTTT TACAAAGTAT TTATCGAACG GAAAATCTTG GATTTCTGCC GC AT T AT G C A 15300 

CATCAAAGTC ACTATGATCA AATTTTGTAA TGCCACCAAT GCCGATTTTC CCAGTTGCTA 153 60 

AACTATTCCA AAATTCTTCT GGTGTATTTC CGATTGGAGA TGTTACTCCA TAACCTGTTA 15420 

CCACTACTCG ATTTAGTTTC ATTCTTTTCA CCTCTAGCTT TCGCTACATA CTTAAGCCAC 15480 

CATCAATGGC AACCACTTGT CCAGTTAGAT AATCTTGGCC TGCTAAAAAT ACTGTCAAAT 15 540 

CTGCAACCTG CTCTGCCTGC CCAAATTCTT TCATCGGAAT CTGAGCTAGT GTAGCTTCCT 15 600 

TAATCTTATC TGACAGGATA GCGGTCATAT CAGACTCAAT C ATT C CT GG A GCAATCACAT 15 6 60 

TGACTCGTAT ATTCCGACTA GCGACCTCGC GTGCCACAGA CTTGGTAAAG CCAATCAAGC 15720 

CAGCCTTAGA AGCAGCATAA TTAGCTTGAC CAATATTCCC CATCAAACCA ACAACACTAG 157 80 

ACATATTAAT GAT AG C AC C T TCTCTGGCTT TCATCATCGG TTTCAAGACT GATTGTGTCA 15840 

TATTAAAGGC ACCAGTCAGA TTGACCTTGA GCACTTTTTC AAAATCTGCT TCTGTCATCT 15900 

T G AGC AT AAG AGTATCTTGG GTAATCCCTG CATTGTTGAC C AAAAC AT C T ACTGAACCCA 15960 

GTTCTGCAAT AGCTTGATCA ATCATACGCT TAGCGTCTGC AAAATCTGAT ACATCTCCTG 1602 0 

AAATGGGAAC CACCTTGATA CCATAGTTTG AAAAC T C AG C GAG C AATTCT TCTGAGATTG 16080 

CCCCACGACT GTTTAAGACA ATGTTGGCTC CTGCTTGAGC AAACTTGTGG GCGATGGCAA 1614 0 

GACCAATTCC ACGACTCGAA CCTGTAATAA AG AT AT T T T T ATGTTCTAGT TTCATTTTTT 162 00 

TCCTTTCAAA ACTTCTACTT ATTTTAGTCT ATTTTTCTAA AAGTGCTACT AAACTCGCTT 16260 

GATCTTCCAC ATGAGCTAAG TGAGCAGTTT GATCAATTTT TTTAACAAAA CCTGACAAGA 16320 

CTTTCCCCGG TCCAATCTCG ATAAAGTTGC TTATGCCTGC TTCTTGCATG ACCCCAATAC 163 80 

TTTCATAGAA ACGAACGGGT TCCTTGACCT GACGCGTCAA GAGCTGAGCA ATGTCCTCTT 16440 
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TTTGCATCAC AGCAGCTTCT GTATTGCCGA CTAGGGGACA AGTAAAATCT G AAAAACT T A 16500 

CCTGAGCTAG AGTTTCAGCT AGTTTCTGGC TAGCAGGTTC AAGGAGAGCG GTGTGAAAGG 16560 

GACCTGACAC CTTAAGAGGA ATCAAGCGTT TGGCACCTGC TTCTTGCAAA AGTTCAACCG 16620 

CTCGATCAAC TGCAACCACT TCTCCAGCAA TGACGATTTG TGCAGGTGTG TTATAGTTGG 16 680 

CTGGAGTAAC CACTCCAAGT TCAGAAGCTT TTTGACAGGC TTCTTCAATG ACCTCTACTG 16740 

GCGTATTGAG AACTGCTACC ATCTTGCCAG AGTCAGCAGG AGCCGCTTCT TCC AT AT AG G 168 00 

CTCCACGCTT AGCTACCAAG GCAACCGCAT CTTCAAAATC CAAGGCGCCA CTTGCCACCA 168 60 

AGGCAGAGTA TTCTCCAAGA GACAAACCAG CAACCATATC AGGCTGATAG CCCTTTTCTT 16920 

GCAATAAACG GTAGATAGCA ACCGAAGTCG CTAGAATGGC TGGTTGCGTA TAGCGGGTCT 16980 

GATTGAGTTT GTCTTCTTCC GTATCGATGA GATAACGCAA ATCATAACCG AGCACCTGGC 17040 

TCGCTCGATC AATCGTTTCT TTAACAATCG G AT ACTG AT C AT AG AAATC C CGTCCCATCC 17100 

CTAGATACTG GGCACCTTGA CCAGCAAATA AAAAGGCTGT TTTAGTCATT TCTTACAACT 17160 

CCTGTCCAGC GAGAGGCTTC TTCTTGAATT TTCTTAGCGG CTCCGTAATA CAAATCTTTT 17220 

AGGATTTCTT CAGCTGTTTC TTCTTTAGAA ACAAGCCCTG CGATTTGACC TGCCATAACA 172 80 

GAG CC AC CAT CCACATCACC GTGAACAACT GCTTTGGCTA GAGCACCTGC TCCCATTTGT 17340 

TCAAAGATTT CTAAATCAGG ATCTTCTTGC TTAAAGGCAT CTTTTTCAGC CAGTTCAAAA 17400 

TCTCTAGTCA ACTGATTTTT AAT AG C AC G A ACAGCATGAC CAAAGTGCTG AG CTG AAATC 174 6 0 

GTAGTATCAA TATCCCTTGC TTTTAAAATT TTCTCCTTGT AGTTTGGATG GGCATTCGAC 1752 0 

TCTTTTGCAA CTACAAACCG TGTCCCCACC TGTACAGCCT CTGCACCTAG CATAAAGCCA 175 80 

GCCGCAGCAC CTTCACCATC CGCAATTCCT CCTGCAGCAA TAACAGGAAT AGATATAGCT 17 640 

GTGGCTACCT GTCGCACCAA GGTCATGGTT GTTAATTTAC CGATATGCCC CCCAGCTTCC 177 00 

ATTCCTTCTG CAATAACAGC GTCTGCACCG ATTTTTTCCA TGCGTTTAGC TAAAGCGACA 17760 

CTAGGAACAA CAGGAATAAC GATTATCCCA GCTTCATGGA AACGTTCCAT ATACTTGCTT 17 82 0 

GGATTTCCTG CTCCTGTTGT GACAACTTTA ACACCTTCTT CAATAACGAG ATCCACGATG 17 880 

TCTTCCACAA AGGGAGATAA GAGCATGATG TTGACCCCAA AGGGTTTATC AGTCAATGAT 1794 0 

TTGATTTTAT CAATATTGGC CTTGACAACT TCTTTCGGGG CATTTCCCCC ACCGATAATT 18000 

CCTAATCCTC CAGCCTTGGA AACAGCCCCT GCC AAATC AC CATCAGCAAC CCAGGCCATC 18060 

CCTCCTTGGA AAATAGGATA ATCAATCTTC AATAATTCTG TAATACGCGT TTTCATAGTG 1812 0 

CCTCCAACCT TCCTTGCTTA CGTAATAGTT CGATTTCACC ATAATTTGAC AGTCAAACTA 18180 

TTACCTAAAC AAGAGGGAGT GGGTTTCTCC CTACTCCTTC TACTAATATT CTGCTTATTT 18240 
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TGCTTGCTCT TCAACGTAAG CAACCAAGTC ACCAACTGTT TTCAAGTCAT TTTCTGCTTC 183 00 

GATTTGGATA TCAAAAGCAT CTTCGATTTC TGAGATTACT TGGAACAAGT CCAATGAATC 183 60 

TGCGTCCAAA T CAT C AAAAG TTGATTCAAG TGTTACTTCT GATGCGTCTT TTCCAAGTTC 18420 

TTCAACGATA ATTTCTTGTA CTTTTTCAAA TACTGCCATG ATAGGACTCC TTTAAAATAA 184 80 

ATAGTTTTTT TATAACAATG TGTTCACCAC ATGATTACCT AAATTGTAAG AATGAGCGTG 18540 

CCCCAGGTCA AGCCTCCACC GAAGCCTGAT AGAAGAACAG TCTGGCTACC ATCTAAAGGG 18 600 

ATGAGACCTT GT T C T AC AC A CTCTGAAAGT AAAATCGGGA TACTGGCTGC ACTGGTATTG 18660 

CCATATTCCA TCATATTGGC TGGAAGTTTG GCTCGGTCAA CACCAATTTT TCTAGCCATC 18720 

TTATCCAAAA TACGGTCATT GGCTTGATGA AGT AG C AG AT AATCCAAGTC TGTCACCTCT 187 80 

ATAGGAGATT CATCAATAGT CTGCTTGATA GACTTGGCTA CATCTCGAAT GGCAAAATCA 18840 

AAGACTGTGC GTCCATCCAT CTTCAAAAAC GAATCTGCAC TTTCTTGATC TGAAAATGGA 18900 

GAATGTAAAC CTGAATGCCC ATAAGTTAAA CACTCGCTGC GACTTCCATC GCTATTGAGA 18960 

CTCTCAGCTA AGAAATGCTC TTGCTCGCTA GCTTCTAACA AGACACCACC AG C AC CAT C T 1902 0 

CCAAACAACA CAGCTGTTGA TCGATCCGAC CAATCGACTG CCTTAGAGAG GGTTTCACTA 19080 

CCAATCACCA AGCCTTTTTG AAAGCGACCA GAAGCGATAA ACTTTTCAGC AGTTGAAAGA 1914 0 

GCAAATACAA ATCCACTGCA AGCCGCGGTT AAGTCAAAAG CAAAGGCTTT ATTAGCACCA 192 00 

AT AT TAG C TT GAACACGAGC AGCTGTAGAG GGCATCATCG AATCTGGAGT AATGGTAGCT 192 60 

AGGATGATAA AATCCAGTTC TTCTCCTGTT ATTCCAGCTT TTGCCATCAG TTTCTTAGCA 19320 

ACCTCTGTAG CCAAATCACT GGTAGATTCT GTTCTTGAAA TATGCCTTTG TCGTATTCCC 193 8 0 

GTTCGACTTG AAATCC ACT C ATCATTGGTA TCCATAATCT GAGCCAAGTC GTGATTTGTA 1944 0 

ACCACTTGCT CTGGCACATA ATGAGCAACC TGACTTATTT TTGCAAAAGC CATTATTTCA 19 500 

AATCCTCCAA AAATTGGTAA AGATTAGTCA AACCTTTACC CATGACAGCA ATTTCTTCCT 19560 

CGCTCATGCC ATCAATAATT TTTTCTACCA TGGCCTTGTG GAAGCGTTTA TGCAGTCTAT 19620 

GAATCAAGCG ACCCTTCTTT GTCAAATGCA GATGCACCAC ACGACGATCC TGTTCTGACC 19680 

GAACTCGCTC AATGTAGCCC GG 197 02 
(2) INFORMATION FOR SEQ ID NO: 8: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6211 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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GAAAATTTCC 


TCTCTTCTCT 


TGAAAAATTT 


TGAAAAAATG 


GTATGATAGT 


AACAAGTTAT 


60 


TTTTAAGAGG 


AAAGAAAGGG 


GAATAATGGA 


GAAAATCAGT 


TTAGAATCTC 


CTAAGACGGG 


120 


GTCGGACCTA 


GTTTTGGAAA 


CACTTCGTGA 


TTTAGGAGTT 


GAT AC C AT CT 


TTGGTTATCC 


180 


TGGTGGTGCG 


GTTTTGCCTT 


TTTATGATGC 


GATATATAAT 


TTTAAAGGCA 


TTCGCCACAT 


240 


TCTAGGGCGC 


CATGAGCAAG 


GTTGTTTGCA 


TGAAGCTGAA 


GGTTATGCCA 


AATCAACTGG 


300 


AAAGTTGGGT 


GTTGCCGTCG 


TCACTAGTGG 


ACCAGGAGCA 


ACAAATGCCA 


TTACAGGGAT 


360 


TGCGGATGCC 


ATGAGCGATA 


GCGTTCCCCT 


TTTGGTCTTT 


ACAGGTCAGG 


TGGCGCGAGC 


420 


AGGGATTGGG 


AAGGATGCCT 


TTCAGGAGGC 


AGACATCGTG 


GGAATTACCA 


TGCCAATCAC 


480 


TAAGTACAAT 


TACCAAGTTC 


GTGAGACAGC 


TG AT AT T C C G 


CGTATCATTA 


CGGAAGCTGT 


540 


CCATATCGCA 


ACTACAGGCC 


GTCCAGGGCC 


AGTTGTAATT 


GACCTACCAA 


AAG AC AT AT C 


600 


TGCTTTAGAA 


AC AG ACTT C A 


TTTATTCACC 


AGAAGTGAAT 


TT AC C AAG T T 


ATCAGCCGAC 


660 


TCTTGAGCCG 


AATGATATGC 


AAATCAAGAA 


AATCTTGAAG 


CAATTGTCCA 


AGGCTAAAAA 


720 


GCCAGTCTTG 


TTAGCTGGTG 


GTGGAATTAG 


TTATGCTGAG 


GCTGCTACGG 


AACTAAATGA 


780 


ATTTGCAGAA 


CGCTATCAAA 


TTCCAGTGGT 


AACCAGTCTT 


TTGGGACAAG 


GAACGATTGC 


840 


AACGAGTCAC 


CCACTCTTTC 


TTGGAATGGG 


AGGCATGCAC 


GGGTCATTCG 


CAGCAAATAT 


900 


TGCTATGACG 


GAAGCGGACT 


TTATGATTAG 


TATTGGTTCT 


CGTTTCGATG 


ACCGTTTGAC 


960 


GGGGAATCCT 


AAGACTTTCG 


CTAAGAATGC 


TAAGGTTGCC 


CACATTGATA 


TTGACCCAGC 


1020 


TGAGATTGGC 


AAG AT T AT C A 


GTG C AG AC AT 


TCCTGTAGTT 


GGAGATGCTA 


AGAAGGCCTT 


1080 


GCAAATGTTG 


CTAGCAGAAC 


CAACAGTTCA 


CAACAACACT 


GAAAAGTGGA 


TT G AG AAAGT 


1140 


CACTAAAGAC 


AAGAATCGTG 


TTCGTTCTTA 


TGATAAGAAA 


GAGCGTGTGG 


TTCAACCGCA 


1200 


AGCAGTTATT 


GAACGAATTG 


GTGAATTGAC 


GAATGGAGAT 


GCCATTGTGG 


TAACAGACGT 


1260 


TGGTCAACAC 


CAAATGTGGA 


CAGCTCAGTA 


TTATCCCTAC 


CAAAATGAAC 


GTCAGTTAGT 


1320 


GACTTCAGGT 


GGTTTGGGAA 


CAATGGGCTT 


TGGAATTCCA 


GCAGCAATCG 


GTGCTAAAAT 


1380 


TGCTAACCCA 


GATAAGGAAG 


TAGTCTTGTT 


TGTTGGGGAT 


GGTGGTTTCC 


AAATGACCAA 


1440 


CCAGGAGTTG 


GCTATTTTGA 


ATATTTACAA 


GGTGCCAATC 


AAGGTGGTTA 


TGCTGAACAA 


1500 


TC ATT C ACT T 


GGAATGGTTC 


GCCAGTGGCA 


GGAATCCTTC 


TATGAAGGCA 


GAACATCAGA 


1560 


GTCGGTCTTT 


GAT AC C C TT C 


CTGATTTCCA 


ATTGATGGCG 


CAGGCTTATG 


GTATTAAAAA 


1620 


CTATAAGTTT 


GACAATCCTG 


AGACCTTGGC 


TC AAG AC C T T 


GAAGTCATCA 


CTGAGGATGT 


1680 



WO 98/18931 



PCT/US97/19588 



207 

TCCTATGCTA ATTGAGGTAG ATATTTCTCG TAAGGAACAG GTGTTACCAA TGGTACCGGC 1740 

TGGTAAGAGT AATCATGAGA TGTTGGGGGT GCAGTTCCAT GCGTAGAATG TTAACAGCAA 1800 

AACTACAAAA TCGTTCAGGA GTCCTCAATC GCTTTACAGG TGTCCTATCT CGTCGTCAGG 1860 

TTAATATTGA AAGCATCTCT GTTGGAGCAA CAGAAGATCC GAATGTATCG CGTATCACTA 1920 

TTATTATTGA TGTTGCTTCT CAT GATG AAG TGGAGCAAAT CATCAAACAG CTCAATCGTC 1980 

AGATTGATGT GATTCGCATT CG AG AT AT T A CAGACAAGCC T C AT TTGG AG CGCGAGGTGA 2040 

TTTTGGTTAA GATGTCAGCG CCAGCTGAGA AGAGAGCTGA GATTTTAGCG ATTATTCAAC 2100 

CTTTCCGTGC AACAGTAGTA GACGTAGCGC CAAGCTCGAT TACCATTCAG ATGACGGGAA 2160 

ATGCAGAAAA GAGCGAAGCC CTATTGCGAG TCATTCGCCC ATACGGTATT CGCAATATTG 22 2 0 

CTCGAACGGG TGCAACTGGA TTTACCCGCG ATTAAAAATC CAACTTAAAT TTATTAAACC 2280 

AGCCTAAAAG GCAATAAATA ATAGAAAAGA GAGAAAAGCT ATGACAGTTC AAATGGAATA 2 340 

TGAAAAAGAT GTTAAAGTAG CAGCACTTGA CGGTAAAAAA ATCGCCGTTA TCGGTTATGG 2 4 00 

TTCACAAGGG CATGCGCATG CTCAAAACTT GCGTGATTCA GGTCGTGACG TTATTATCGG 2 4 60 

TGTACGTCCA GGTAAATCTT TTGATAAAGC AAAAGAAGAT GGATTTGATA CTTACACAGT 2 52 0 

AGCAGAAGCT ACTAAGTTGG CTGATGTTAT CATGATCTTG GCGCCAGACG AAATTCAACA 2580 

AGAATTGTAC G AAG C AG AAA TCGCTCCAAA CTTGGAAGCT GGAAACGCAG TTGGATTTGC 2 640 

CCATGGTTTC AACATCCACT TTGAATTTAT CAAAGTTCCT GCGGATGTAG ATGTCTTCAT 2 700 

GTGTGCTCCT AAAGG AC C AG GACACTTGGT ACGTCGTACT TACGAAGAAG GATTTGGTGT 27 60 

TCCAGCTCTT TATGCAGTAT ACCAAGATGC AACAGGAAAT GCTAAAAACA TTGCTATGGA 2 82 0 

CTGGTGTAAA GGTGTTGGAG CGGCTCGTGT AGGTCTTCTT GAAACAACTT ACAAAGAAGA 2 880 

AACTGAAGAA GATTTGTTTG GTGAACAAGC TGTACTTTGT GGTGGTTTGA CTGCCCTTAT 2 940 

CGAAGCAGGT TTCGAAGTCT TGACAGAAGC AGGTTACGCT CCAGAATTGG CTTACTTTGA 3 000 

AGTTCTTCAC GAAATGAAAT TGATCGTTGA CTTGATCTAC GAAGGTGGAT TCAAGAAAAT 3060 

GCGTCAATCT ATTTCAAACA CTGCTGAATA CGGTGACTAT GTATCAGGTC CACGTGTAAT 312 0 

CACTGAACAA GTTAAAGAAA ATATGAAGGC TGTCTTGGCA GACATCCAAA ATGGTAAATT 3180 

TGCAAATGAC TTTGTAAATG ACTATAAAGC TGGACGTCCA AAATTGACTG CTTACCGTGA 3240 

ACAAGCAGCT AACCTTGAAA TTGAAAAAGT TGGTGCAGAA TTGCGTAAAG CAATGCCATT 33 00 

CGTTGGTAAA AACGACGATG ATGCATTCAA AATCTATAAC TAATTAGAAA TATATAGCGC 3 3 60 

TGGAGATGAT TTTATGAAAA AGATTATGAG AAAAATTG C A TCGTTATTAT TGGTTCTAGT 342 0 
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TGTATAATGT AATTACACCG TCGGTAATAG TGCTAGCAGA CCAAAATAAA GCAGATTGGT 34 80 

CGTATGATGA AAATGCTGTA ATTAACATTT ATGATGATGC TAATTTTGAA GATGGTAGGT 3 540 

TGCATATGAA CTTTGAACAA TTCTTCAAAT TGGCACAAAT AG C T AG AG AA GAAGGTCTTG 3 600 

AAATTCATTC TCCGTTTGAG AGAGCTGGTG CGACTAAATC TGCTCGTTAT ATAGCGAAAT 3 660 

GGATTTTGAG AAATAAAAAA CAT T AAC AAA TATAGTTGGT AAAT CAT TAG GACCTAAATC 37 2 0 

AG C TGT T AG A TTCGGAGAAG CTTTATCCTA TATTGAAGGT CCTCTTCGCA GAATAAATGA 3780 

G AC GAT AG AT GGCGGTTTAT AT C AAAT AG A GCAAATTATT GCATCTGGAT TGAAAGAATC 3840 

GGGTTTAAAT GACTGGACTG CGAAAACTTT AG CTTC AG CT ATTCGTGGGA TATTAGATGT 3 9 00 

ACTTATTTAG GGGTTGAAAT CATATGAATA TTACCAATTT GTTTTCTATC AAGACAGGAT 3 9 60 

GTGATGAAAC TGATAGGCAA CTGCAAAAAC TATTTTTTCA GTTGGATTTA CAATTGGGAG 402 0 

AATTGACAGA TCAACTAAGA AAATTAGATT CTAATTTTGT TCCTCGTAGT CAATTTGTAG 4 080 

ACACGTTGGA TTTGAATGAT GTAGAATATA AAGAAATTTT AAACTATTTT ATCTTCCATC 414 0 

GTAATGATAG TGAAGAAAGT TTGGTAGAAT GGTTATATGA TTGGATTTCC ACAAATCGTT 42 00 

ATGAACTTCC TAAAGAGTTT TCGATTCGTA TGGCTCATAA ATACCATGAA AGTGTTACTG 4 2 60 

AAGTTTTCGG AGATGAATAA CTAAAAAACA GTCATTAGTG ACTGTTTTTT ATAGAAAAAG 4320 

AGGTTTTATA TGTTAAGTTC AAAAGATATA ATCAAGGCTC ACAAGGTCTT GAACGGTGTG 43 80 

GTTGTGAATA CTCCACTGGA TTACGATCAT TATTTATCGG AGAAGTATGG TGCTAAGATT 444 0 

TATTTGAAAA AAGAAAATGC CCAGCGTGTT CGCTCCTTTA AAATTCGTGG TGCCTATTAT 4 500 

GCCATTTCCC AGCTCAGCAA GGAAGAACGT GAACGTGGGG TAGTCTGCGC TTCTGCGGGA 45 60 

AATCATGCGC AGGGAGTAGC CTATACTTGT AATGAAATGA AAATTCCTGC TACTATCTTT 4 62 0 

ATGCCCATTA CTACGCCACA ACAAAAGATT GGTCAGGTTC GCTTTTTTGG TGGGGATTTT 4680 

GTAAC T ATT A AACTAGTTGG AGATACCTTT GATGCCTCAG CCAAAGCAGC TCAAGAATTT 47 40 

ACAGTCTCTG AAAAT CGT AC CTTTATTGAT CCTTTTGATG ATGCTCATGT TCAAGCAGGT 4 800 

CAAGGAACAG TTGCTTATGA GATTTTAGAA GAAGCTCGAA AAGAATCGAT TGATTTTGAT 4 8 60 

GCTGTCTTGG TTCCTGTTGG TGGTGGCGGT CTCATTGCCG GGGTTTCTAC CTATATCAAG 492 0 

GAAACAAGTC CAGAGATTGA GGTTATCGGA GTAGAGGCGA ATGGAGCGCG TTCCATGAAA 4980 

GCTGCCTTTG AGGCTGGAGG TCCAGTAAAA CTCAAGGAAA TTGATAAATT TGCTGATGGG 5040 

ATTGCTGTGC AAAAGGTAGG TCAGTTGACC TATGAAGCAA CTCGTCAACA TATTAAAACT 5100 

TTGGTAGGTG TCGATGAGGG ATTGATTTCT GAAACCTTGA TTGACCTTTA CTCTAAGCAA 5160 

GGGATAGTCG CAGAACCTGC TGGAGCGGCT AGTATCGCCT CTTTAGAGGT TTTAGCTGAA 52 2 0 
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TATATTAAGG GGAAAACCAT TTGTTGTATC ATTTCTGGAG GAAATAATGA TATCAACCGT 52 80 

ATG C C AG AAA TGGAAGAGCG TGCCTTGATT TATGATGGTA TCAAACATTA CTTTGTGGTC 53 40 

AATTTCCCAC AACGTCCAGG AGCTTTGCGT GAGTTTGTAA ATG AT AT C C T GGGGCCAAAT 5400 

GATGATATCA CACGTTTTGA GTATATCAAA CGAGCTAGCA AGGGAACAGG CCCAGTATTA 54 60 

ATTGGGATCG CTTTAGCAGA TAAGCATGAT TATGCAGGTT TGATTCGTAG AATGGAAGGT 552 0 

TTTGATCCAG CTTATATTAA CTTAAATGGT AATGAAACGC TTTATAATAT GCTTGTCTGA 5580 

GGACTAATAA AAAAATATCA TACCTTCATT TTGATTTCCT ATCTATTGAC AAGCATAGTC 564 0 

ACACTGTCTT TAATACTCTT CGAAAATCTC TTCAAACCAC GTTAGCTCTA TCTGCAACCT 5700 

CAAAACAGTG TTTTGAGCAA CTTGCGGCTA GCTTCCTAGT TTGCTCTTTG ATTTTCATTG 57 60 

AGTATAAGGT ATGATTTGAT TTCTTTTTGT TGACAAATAT ACT AT AT T AA AAAGATATAT 582 0 

AAGTAATTAA CTGAGCTTAT CTGTCTTGTC ATCTCTATTA AGGATGGTTT AGATAATCGG 5880 

GTGTCTGCTT CTAGGCTAGC ACCTCAATAT CCAAAGGAGT GATGAATTTG AAGGACATAA 5 94 0 

GGAATACCTA TCTCTCAGAT GATTTATTGA GGAAGAAAGA TAGGAGTTTT TGAGCTAGTG 6000 

AAGGCTTGGA TTTCTAAAGG TTAGAACTAT CATCTTCAGT TCTTAAATCG AAGAAATAAG 60 60 

CTATCTTACG GAAATAGAGA AGCATTTTTT AAGAACTTGA ATAATTTCGC ACCTTAAGAG 612 0 

GGTAATAATA CAGTATTTTT ATTAGCAAAT ATTTATGGTG TAGAGGCTAG CAAAACCTAT 6180 

AT AT T AT CG G ATTTAAAAAG GAAGTAAGAA A 6211 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7939 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CCGGACTCCC CACGATTCTT CAAAATAACT GAGTATATTT CTATCTTGAT TTTCAGATAT 6 0 

AAATTCTTCC TTCTGTGGCC TCTTCTTACG CTTGAGAAGA GCTTCTCCGA CATGGCTTCT 12 0 

TCCTTACTGA GCAAAACCTT GAGCATAGAT AAGTTTGACT GGCAAGCGTG CTCTTGTATA 180 

TTTGGCTCCC TTCCCACTAT TGTGGATAGC GAGGCGTCTT CTCATATCAG TCGTATAGCC 240 

TATATAGTAG GATCCATCAC G ACACT C C AG AACGTACATA TAAGCCTTAT GATCCATAAT 3 00 

AAATCTCTTC GATTTCGGGC GTATAAGAGC CATCATCATT GTGGACAATC AAAGGAGGTA 3 60 
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AG AC CTT AAA GCCACTTGTT GAGCCATCCT TGATCGCCTC AATCAAAAGC ATATTGGCTT 42 0 

CCTTTTCTCT TTTTGGATAA ACAAACTGCA GGCGCTTAGG GGCTAGATTA TGTCGTTTTA 480 

ACGTATCCAA AATATCCAGA AGTCGATCAG GACGATGAAC CATGGCCAAA CGCCCATTAG 54 0 

ACTTGAGAAT ACTCTGGGCA CTACGACAGA TTTCTTCCAA ATTAGTCGTG ATTTCGTGTC 600 

GAGCCAAGAG ATAATGTTCA CTCTCGTTCA GATTAGAATA AGGATTCACC TTGAAATAGG 6 60 

GTGGATTACA CAAAATCATA TCCACCTTAC TCCCCTGAAT GTGAGCAGGC ATATTTTTCA 72 0 

AATCATCGCA GATGACCTGC ATTTGCTCCT CTAATCCATT CAAACGGACA GAGCGTTCAG 780 

CCATATCCGC CAAACGCTCC TGAATCTCAA CAGACAATAT CTGTGCTTGA GTACGAGTGC 84 0 

TAGCAAAAAG CCCCACTGCT CCATTCCCAG CACAGAAATC CACAATCAAC CCCTTCTTAG 900 

GAAAACGTGG AAAT CGTG AT AAGAGAACAC TATCCACCGA ATAGCTAAAA ACCTCTCTAT 960 

TTTGAATGAT TTTGATATCT GTCGAAAAGA GCTGGTTAAT GCGCTCTCCT GATTTTAATA 102 0 

ATTGTTCTTC TTCCATGGTC CTATTATAGC AAAT T CAT AT TAACATTACA AAAAATATAA 1080 

AACTCTAAAC TACTTCTTCT TTTTTAAATG GTGCAGGGCT TCTCCAGTCC AGATTGGTAG 1140 

CATTCGTCGA AAGGGAGCAA AGCCGTAGTT AAAGCGGTCG CTTGAAAAGC GTCTCCGTCT 12 00 

AGGAAACTGG TACTTTTCTT CCTCCAAAGT GCGGATAGAA AGACTGGCTT TCCCTGTAAA 1260 

TTCATCTAAA TCCACTACCT GAACTTGAAC CTCTTCATCG ACTTTCAAGG TTTCATGAAT 132 0 

ATTTTCAATA AATCCTGTCC GAATCTCTGA AATGTGAATC AGCCCCGTAT CACCCGTCTC 13 80 

TAACTCAACA AAGGCACCGT AGGGCTGAAT CCCTGTAATA CGCCCCTTTA GCTTATCACC 144 0 

GATTTTCATC TTAGTCCTCG ATTTCAATAG TTTCAATTAC AACATCTTCA ACTGGCTTGT 1500 

CCATAGCTCC TGTCTCAACA GCAGCAATGG CATCCAAGAC AGCGTAAGAT GCTT C AT C AG 1560 

CTAACTGACC AAAAACCGTG TGACGGCGGT CTAGGTGAGG TGTCCCACCT TGATTGGCAT 162 0 

AGATTTCTGC AATCGGTTCT GGCCAACCAC CACGAGTAAT TTCTTTCTTA GAATAAGGTA 1680 

GGTGTTGGTT TTGCACGATA AAGAACTGGC TGCCGTTGGT ATTTGGACCA GCATTTGCCA 1740 

TGGAAAGAGC AC C ACGG AT A TTGTAAAGCT CTTCTGAGAA TTCATCCTCA AAAGATTCGC 1800 

CGTAGATTGA CTCGCCACCC ATACCAGTTC CAGTTGGGTC TCCACCTTGG ATCATAAAGT 1860 

CCTTGATAAT ACGGTGGAAA ATGACACCAT CATAGTAGCC ATCTTTTGAA AGAGATACAA 192 0 

AGTTAGCCAC TGTTTTAGGA GCATGTTCAG GGAAAAGCTT GATACGTAAG TCTCCGTGAT 1980 

TGGTCTTAAT AGTCGCAAGA GGACCTTCTA CTGTTTCAAT GTCTACTTGT GGAAAATGCA 2 040 

ATTCTTTTTC TACCATACCA AATACTTCTA AGGC AG C AAA AATGCCATCT TCTTCTAATG 2100 

TTTTTGTAAT ATAATCTGCT TTTTCTTTGA TTTTATCATG AGAAATTCCC ATGGCAACGC 2160 
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TGATTCCAGC ATAATCAAAG AGTTCCAAGT CGTTGAGACC ATCTCCAAAA ACCATGACCT 222 0 

TCTCTGGTTT CAAGCCAAGG TGTTCCACAA CCTTTTCCAC CCCCGTCGCT TTGGAGCCTG 2280 

AAATCGGCAC AATATCAGAC GAATGTTGAT GCCAACGAAC CATGCGAAGT TTGTCTGAGA 234 0 

GACTGTCAGG CAAGTGCAAG TCATCTCCCT TATCTTCAAA AGTCCACATC TGATAGATAT 2400 

CTTCTTTTTC ATGGAAATCG GG AT CT AC AT CTAAGTCGGG ATAAATTGGA TTGATAGCTT 24 60 

CACTCATCAT ATCGGTGCGA GTCGACAACT TGGCATCATG ACTCCCAACC AAGCCATACT 2 52 0 

CAATTCCTTC TTGCTTAGCC CAAGAGATAT ACTCCTCAAC ATCTGACTTT TCAATCTGAT 2580 

GCTGATAAAT GACCTGACCT TTTTTATCTT CGATATAAGC C CC ATT C AAA GTTACAAAAA 2 640 

AGTCAGGCTT GAGATCACGA ATCTCTGGAA C AAC AC C AAA AATGCCACGT C C AG AGGCG A 27 00 

TTCCTGTTAA AATTCCTTTT TCACGCAACT GTTTAAAAAC AGTGGGAATT GTAGTTGGAA 27 60 

TAAACCCTGT CTTTGAATTC CGCAATGTAT CAT C AATAT C AAAAAAGACA ATCTTGATCT 2 82 0 

TCTTTGCCTT GTATCTTAAT TTCGCGTCCA TCTCACTACC TCTTTCAATC TAACTCTTTC 2 880 

C ATT AT AT C A TAAAGTAGGC AAATCCCCTA TTTTCAAAAA GTTTATCATT TTTATTTTAA 2 940 

TTTCTTGGAT GAGAAAAGAG ACATATTTAT GAAAAAGCTC CATCGTGCTT TTAATGTGTT 3 000 

CTCTTGTTTT CAAACTCGTA AAAAGGGAGC CACTGATCCT AACTCGCTCT CTCATTTCAA 3 060 

AGCTTGTGAA AAAAGACCCG TTGGGGTCTT AATTCGCTTT CTTGTTTTCA AGCTCATGAA 312 0 

AAAGAGACCC AACTGGGTCT TTTCTTTAAT CTTCGTTTAC G AAAGG C AT C AAAGCCATTA 3180 

CGCGAGCGCG TTTGATAGCT GTTGTTACTT TACGTTGGTT TTTAGCTGAA GTTCCTGTTA 3 2 40 

CACGACGAGG AAGGATTTTC CCACGTTCTG AAACGAAACG GCTAAGAAGC TCAGTATCTT 3 3 00 

TGTAATCAAC ATATTCAATT TTGTTTGCTG CGATGTAATC AACTTTTTTA CGGCGTTTGA 3 3 60 

ATCCGCCACG ACGTTGTTGA GCCATGTTTT TTCTCCTTTA TAAGTTTAGT TGTCCATTAG 3 42 0 

AATGGTAAAT CATCATCTGA AATATCCAAT GGGTTTGTTG CTCCAAATGG ATTTTCATTA 34 80 

CGTGAAAAGT CTGGTACTGA ATTTGTAGGT GCTGAATAGT TTGCAGTTGG TGCAGAGTAA 354 0 

GCTCCACCTG TGTGACCCTC ACG C AC ACTA CGGCTTTCCA AC AT TTGG AA ATTCTCAGCC 3 6 00 

ACGACCTCTG T C AC GT AG AC ACGTTGTCCT TGCTGGTTAT CGTAACTACG AGTCTGGATA 3 6 60 

CGACCTGTCA CCCCGATAAG TGAGCCTTTT TTAGCCCAGT TAGCAAGATT TTCAGCCTGT 3 72 0 

TGGCGCCACA TAACGACATT G AT AAAAT C A GCCTCACGTT CACCATTTTG ACTCTTAAAT 37 80 

GTACGGTTTA CTGCAAGAGT AAAAGTCGCA ACTGCTACAT TTGATGGGGT AT AAC GC AAC 3 840 

TCAGCGTCAC GTGTCATACG CCCTACAAGT ACAACATTGT TAATCATAGT TTACCTTCTT 3 900 
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ACGCGTCAAT TTTGACGATC ATGTGACGAA GAATGTCAGC GTTGATTTTT GAAAGACGGT 3960 

CAAACTCTTT AAGAGCTGCA TCGTCATTTG CTTCAACGTT AACGATGTGG TAAAGTCCTT 4 020 

CACGGAAATC TTGGATTTCG TATGCAAGAC GACGTTTTTC CCAAGTTTTT GATTCAACAA 408 0 

CAGTTGCACC GTTGTCAGTC AAAATAGAGT CAAAACGTGC TACCAAAGCG TTTTTAGCTT 414 0 

CTTCTTCAAT GTTTGGACGA ATGATATAAA GAATTTCGTA TTTAGCCATT GATATGTTCC 42 00 

TCCTTTTGGT CTAATGACCC CAAGACTTTG CAAGGGGTAA GTGAGGTTCG CTCACAATAA 42 60 

ACTATTATAC TAGAAAAAAT TTTTTTACGC AAGTAAAAAC ACTAGAATTC GAAAAAACGC 4 320 

CACATGGGCG TTTTCCTGTT CTTATGGTTT GATACGGTGC AACATACGTG GGAATGGAAT 4380 

AGCTTCACGG ATATGTTTTG TTCCTGCTGC GAAGGTTACC ATACGTTCGA TACCGATACC 444 0 

AAATCCTCCG TGTGGAACTG TACCGTATTT ACGAAGGTCA AGGTAGAATT CATATTCTGT 4 500 

ACGATCCATG C C AAGTT CAT CCATCTTAGC GACAAGGGCA TCGTAATCTT CCTCACGCAT 45 60 

AGACCCACCG ATAATTTCTC CATAGCCTTC TGGAGCAAGC AAGTCTGCAC AAAGCACGCG 4 62 0 

CTCTGGATTT CCAGGAACTG GTTTCATGTA GAAGGCCTTG ATGGCTGCTG GATAGTTCAT 4680 

GACAAATGTT GGCACACCAA AGTGGTTTGA AATCCAAGTT TCGTGTGGTG ACCCAAAGTC 474 0 

ATCACCATGC TCAAGATGCT CGTAGTCAGC ATCTTCATCA TTTTCATGCT CTTGCAAGAG 48 0 0 

GTCAATGGCT TGATCGTAAG TGATACGTTT GAATGGCTCT GCAATGTAGC GTTTCAAGAG 4860 

TTCTGTATCA CGTTCCAAGG TTTCCAAGGC TTGAGGCGCG CGGTCAAGAA CACCTTGTAG 4920 

AAGAGCTTTC ACATAAGCTT CTTGCAAGTC AAGCGACTCA TCATGTGTCA AGTATGAGTA 4980 

CTCAGCATCC AT C ATCC AG A ACTCAGTCAA GTGACGGCGT GTTTTTGATT TTTCAGCACG 5040 

GAAAACTGGA CCAAAGTCAA AG AC ACG AC C AAG AG C CAT A GCCCCTGCTT CTAGGTAAAG 5100 

CTGACCTGAT TGGCTCAAGT AGGCTGGCGT TCCGAAGTAG TCAGTTTCAA AGAGTTCTGT 5160 

AGAATCTTCT GCCGCATTTC CTGAAAGAAT TGGGCTGTCA AACTTCATAA AACCGTTCTT 52 20 

GTCAAAGAAC TCATAAGTTG CATAGATAAT AGCGTTACGG ATTTGCAACA CAGCTACTTG 52 80 

CTTACGAGAG CGTAgCCACA AGTGACGGTT AT C CATC AAA AAGTCTGTTC CGTGTTCTTT 534 0 

TGGTGTGATT GGGTAGTCTT GAGATTCACC GATCACTTCG ATGTCTGTGA TGTCCAACTC 5400 

ATAGCCAAAT TTAGAACGTT CGTCCTCTTT GACAATACCT GTCACATAAA CAGACGTTTC 5460 

TTGGCTCAAG CGTTTGATAA CATCAAACTT CTCAAGTCCC ACTTCTTCAC CAAATTTTTC 5520 

GACAAAGTTT GGTTTAAAAG CCACACCTTG AAAGAAGGCT GTTCCATCAC GCAATTGTAA 55 80 

GAAAGCGATT TTTCCTTTTC CTGATTTGTT GGCAACCCAA GCGCCAATCG TCACTTCCTG 5640 

ACCAACATAG TCTTTTACGT CAATAATCGT TACACGTTTT GTCATTATTT TTCCTTTTCT 5700 
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TTTTTATTCT TTATGGCAAA CCACCTCTAT ATTGTTCCCA TCCAGGTCAA TCATAAAAGC 57 60 

AGCATAGTAA ATCGGATGCT CACTTCGATA AC C AGG AG C C CCATTGTCTC GCCCACCTGC 5820 

CTCTAAGCCA GCCTCATAAC AAGCCTGAAC TTCTTCCTTA TTTTCTGCTA AAAAAGCAAA 5880 

ATGAACAGGA TCTTGTGTTC CCTGAGTCAG CCAAAAATCA CCACCAGGAT GAGGGCTGTT 594 0 

CGGGGATAGA AAACTAATTA GAGAACTAGT CTTAAAAGCC AATTTATAGT CCAAAGGAGC 6000 

GAGAAAACTC CTATAAAATC CTTATGAAAT TTGTAAATCC TTTACCTTAA TCTCAAAATG 60 60 

ATCAATCATT CTCACTACCC ATAAATGCTT TCAAGCGTTC GACTGCTTCT TTAAGCGTGT 612 0 

CTAGGTCTGT CGCATAGCTG AGGCGGACAT TTTCTGGTGC TCCAAATCCA GCTCCTGTTA 6180 

CCAAGGCCAC TTCGGCTTCT TCTAAGATAA CAGTTGTAAA GTCTGTCACA TCCGTGTAGC 62 40 

CTTTCATCTC CATGGCCTTT TTGACATTTG GGAAGAGATA GAAGGCCCCT TGCGGTTTGA 63 00 

CCACTTCAAA TCCTGGTACC TCTGCAAGGA GGGGATAGAT GGTATTAAGA CGTTCCTCAA 63 60 

AGGCCTGACG CATGCTTTCT ACAGTATCTT GCTCACCTGA TAGAGCCTCA ACTGCTGCAT 642 0 

ATTGGGCTAC TGCTGACGGA TTCGAAGTTG TTTGACCTGC AATCTTGGAC ATGGCAGCGA 64 80 

TAATGTCTGC TTCTCCAACG GCATAACCAA TCCGCCAACC AGTCATGGCA TAAGTTTTAG 6 54 0 

ACACACCATT GATGACCACT GTTTGCTTGC GAATCGCTTC CGATAGGCTA GAAATCGGTG 6 600 

TGAACTCATG ACCATTATAA ACCAAGCGGC CAT AG AT AT C GTCTGCTAGG AT G AG AAT AT 6 6 60 

CATTTTCTAC AGCCCAGTTT CCAATTGCCA AGAGTTCCTC ACGGGTGTAA ATCATACCTG 67 2 0 

TGGGATTAGA TGGCGAATTC AG C AC C AAAA CCTTGGTCTT GTCAGTGCGA GCTGCTTCTA 67 8 0 

ACTGCTCTAC GGTCACCTTA AAGTGATTGT CTTCCTTAGC AGAAACAAAG ACGGGAACGC 6840 

CTTCTGCCAT CTTGACCTGA T CT C CAT AG C TAACCCAGTA TGGGGTTGGG ATGATGACTT 6 9 00 

CATCACCTGG ATTGACCACA GCCATAAAGA AGGTATAGAG AGAATATTTG GCTCCCGCAG 6960 

CGACTGTCAC TTGATTTGAC G C T AC AG AAT AGCCGTAAAA GCGCTCAAAG TAGCTATTGA 7 02 0 

CCGCCGCCTT AAGCTCTGGC AGACCTGAGG TTACTGTATA AAAAGAAGCA CGCCCATCTC 7 080 

GAATCGATGC AATGGCGGCA TCTTGGATAT TTTTGGGAGT AGTGAAATCT GGCTCACCCA 7140 

AGGTTAGAGA CAAAATATCT CTACCCTCAG CCTTCAGTGC TTTGGCACGG GCTCCAGCAG 72 00 

CCAAAGTCAC ACTTTCTTCC AT TTCT AAAA CACGGTTGGA T AGTTT CAT A GGCCCTCCTT 72 60 

GTTGACCAAT GCTCCTGTTT CAAAATCTAC TAGATAAAAA TCAGATCCTG ACTTAACTTC 73 2 0 

CCAGATTGGC TTATCTTGAT AACGGCCAAA GGTTATCTTG TCAATCTCGC CAGCTCCCTT 7 3 80 

TTCCTTAGAA ACCGTTTCTG CTTTTTCTTG TGAAACACCC TGATTTAGCT GATAAACGTA 7440 
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AATCTTATGG TCATCTTTAC CAATCAGGAC AGCAAGCGCT TCTTGCTGTT TGTTACGACC 



7500 



AAGAACGCTG TAATAAGATT CCAAGCCATT GTATAAATCA ACCTGATCAG CCTGCTCTAA 



7560 



TCCTGCATAC TGCTGAGCTA ATTTTTCTCC TTCACTTTTA GCTGTTTGAT AGGGTTTCAT 



7620 



GCTAAGAGAA ACCATATACA GAAAGGAACC ACTGATAACC ACAAACAAAA TCGTCATCCC 



7680 



TAG AC CAT AC TGCCACAGTA GATTATTTTT TGCTTTGTTT TGTCTTTTTT TCACTCGTCT 



7740 



ATTTTACCAT CTATTAAGCT TTATTACAAG TGAATATAAG AATACTCTTC GAAAATCTCT 



7800 



TCAAACCACG TCAGCTTTAT CTGCAGACCT CAAAGCTGTG CTTTGAGCAA CCAATTCTAT 



7860 



TTCTCCCTTC AAACAAAACC GATTTTGAAA GTGAAACAGT TCTTACTTTT TCAGTCACAA 



7920 



AT G ATT AG AG TTTGCCGGG 



7939 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9897 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{ D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CCGCTCTACC GTCAAATAAT TACCATTTTG TTTAATACCG AAATTTTTAT CTACTGAAAA 60 

TTCAGTTGGT CTGTTGGTAC GATCGTCGTA TACAGTACCA TTCTCACGAA TAGTATAATT 12 0 

GTAATCAGTA TCACCTTGTT TCCTTAATTT AAGGTAATAA TTACCATCAA TTTGTTTATA 180 

ACCTGAATCT TTTCTAGTTG CTTCTCTAAA ACTTACTCCA GCAGGCATCA CATCAGCAAA 240 

CATGAGTACT TGTTTGTTCT TTTTTTCAAC AATAACAGAG TCAATATAGG TTGCACCACC 300 

GCTGATTTGT AAGTCACGTC CACCAACTTC ACG AGG C CAT TCTAATGGTA CTGGCGCAAA 3 60 

ATCATCGAAT GCCAATGTTA ATTTTGGTTT AGTCCATGTC TTACCATTAT CATCACTATA 42 0 

ACTTGTAGCA ATATTAATTT TATTCAAGAA ATCATGAGTT CCACCGTAAC GAGCGTCAAT 480 

GCTTGAAAAT ACCCGACCAT TGCTAAAAGT ATACAGAACT GGAATACGGA AAT AGT TAG A 540 

ACCTGTTGTA TCATTAGCCG TATAAATTAA ATGTC C AGT A ACAGCGTTTG TTGTCATCTT 600 

TTTAACAGTT TCTTCATCCA AT GC ACT ATT AAAGAATTTG ATATTTTCTA GTGTTCCGTT 660 

AAAACCAAAC GCCGTTTTTC CTGCACGTTT CACTCCCCCA AG CAT AT AGT AATCAATACC 72 0 

TTTAATATCC TTGATGTTTA GGAAAT TAT C CACTTTCTTT T C T AC TACT T TTGTACCATT 7 80 

TGCGTATAAA GAATATGTTT TTTTGACTGA ATCTGCTACT ACTGCAACAG TGTTAGTCAC 840 

AGCCTCTTGT TTGTACTTAC CCCAAACTGA AGCAGGTCTG GATACTAGGT TATTTTTATT 900 
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GGAAGAAGTA TCACGCGCTT CCATCCCCAA CTCACCATTG TCTCTAAGGA AC AC AT CT AC 960 

ATAACTATTT TGTTGACCGG GTTTGGAATT AGATATTCCA AACAGAGCTT GTAAGCCTTT 1020 

CTCACTTGAC TGATTGTACT TAATCACTAC AGTAAAGTCA CCGCTAGTAA ATTTATCCTT 108 0 

TAACTCTTTA GTAACATTTT CTCCGCCCCC TGTTAAAGTA ACATTATTTT TTTCTAAGAC 114 0 

AGGAGTTTCT TCCGCTGTAG AAGATGGATC CTTAACAGTA GTTTCAACTG TTCGAGGTTG 1200 

TACAGTAACT TCCGAAGAGT TATCCGATGT AGGTTGTACT TCCGAAATCG GAGTCGTTGG 12 6 0 

TGCAACAGGT TGCACCAACT TTGGTGTTGA TACTTCAGAA GTTTCAGTCT CCTGAGCTGC 132 0 

AACTGAGTTA GCAACAAATG CTGATAATAC C AC T AC AGT A CCTAAGGTTA CATATTGTTT 13 80 

AATATTTTTT TTCATTTTAT TTTTCCTCGT TTAAAACTTT GATAACAAGT TTTTTAACAG 144 0 

TTTCATCATT GCAATGAATC TTTGGTTGGT GAAGATCTTC TTCAAAAGTC ACCAACATAT 1500 

TCCCTGGAAG CAATTCAACA ATTTGATAGT CTTTGCTATC GTAAAAAGCA ATATCCTTCT 15 6 0 

CTTCGCTAAA AGGTACACGT GACTGGGCAC GAACTGGGGA AGTTACTGCC ATTTTTTCAG 162 0 

TATTTTCAAC AACAATATGA ATATCTAAAT ATTTCTTATG AGTTTCAAAA ATATCTCCTG 1680 

GAACTCCATC AGCTAGATAA GTCATACAAT TTGCAAAAAC ATTTTCCCCG TCAATATCAA 174 0 

TTTTTCCATC AACTAAATCT GTCAAATTTG TATTTTCTAA AAAATCACAG ACTTTTGAAA 1800 

AATATTTATT G AC AGAAG C A TATCGTTTAA AATCAGATTG TTCAGAAATA AT CAT AT TAT 18 60 

TTTCTCTTTT CTATTAGTGA CGAACTTCCC AACTTGAATC CGCTTTAATT TCTGTAATAT 192 0 

CATGAATCGT TGT AT AT T T A GGTGCAGATA CTTTATTTCC AGTAAGAACA GATACAATAT 1980 

AACCTGAAAC TACTGATACA GAGATTGAAA TCAATGAATA TGCCCAGTAG CTAACAGCTG 2 04 0 

TTGGAGGAAG GAAGTATTTA ATAAATACCA TGACGATGGT TGATACAATC AGCGCTGCAT 2100 

AAGCACCTTG TTTATTTGCT TTTTTAGAAA CAAATCCAAG AATAAATACA CCACCAAGTA 2160 

GACCAAGTAC AAGTCCCATG AAACTATTGA ACCATTCGTA TGCAGATTTA ATATCTGAGT 222 0 

GAGCCATGAC AATGGAAACA C C AATTG AG A ATAAACCTAC T G C TAG AG AT ACGAATTGTG 22 80 

CAATTTTCGT ACGACGATTG TCTGACATAT TTTTAGAAAT GACATCTTGA ATATCCAATG 2 340 

TCCATGAAGT TGCAACAGAG TTCAAACCTG TTGAAATAGT TG AT T GAG AT GCTGCATAAA 2400 

TCGCTGCCAA GATCAAACCT GTGATACCTA CTGGTAACTG GTATGCAATA AAGTACATAA 24 6 0 

AGATTTGGTC TTGAGGGATA TTGCTAGCTG CACTATCTGC ATTTTGTACT TGATAGAATA 2 52 0 

CGTACAAGCC TGTACCAATC AAGTAAAAGA CTGTTGCAGT TGCAAGTGAC AAAAC AC CGT 2 5 80 

TTGTGAACAA CATCTTATTA AGTTTCTTAA TATTTTGTGT TGTAGTAAAA CGTTGAACCA 2 640 
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AATCTTGAGA TGAAGCATAG GAAGACAAGA TTGTAAAGCC TGAACCCATC ACAATTAAAA 27 00 

AGATGGAGTT TGAAAGCAAG TTAGGATCGA AAAGTTTTTC ATTTGCAGCA AGGAATTTCC 27 60 

CGTTTGCTAA TGTTTCTGCT ACTGCACCAA AGCCACCTTT AATATTAGCA ATCAGTACAA 2 82 0 

ATAAAGCTAA AACGACACCA CTAATCAGAA TCACACCTTG AATAAAGTCT GTCCATAATA 2 880 

CGGATTTTAG ACCACCAGTA TAAGAATAAA CAATTGCAAC TACACCCATC AAAATAATCA 2 940 

AAAT AT T GAT GTCAATTCCT GTCAATACTG ATAAACCAGC TGATGGGAGG TACATAATGA 3 000 

TAGACATACG TCCCAATTGA TAAATAATAA ACAAGAGTGC TGAAATAATA CGAAGTGCTT 3060 

TAGAATTAAA ACGTTTATCC AAGTAATCAT ATGCCGTATC GATGTCTATC CGTGCAAAGA 3120 

TAGGTAAGAT AAAACGAATT GTCAGTGGAA TAGCTACTAC CATCCCTAAT TGAGCAAACC 318 0 

ATAAAATCCA GCTACCTGCA TAAGAGCTAC CAGCGAGTCC CAAGAAGGAA ATCGGACTGA 3240 

GCATTGTGGC AAAAATGGAT ACCGAAGTAA CATACCAAGG AACCGAACCA TCTCCTTTAA 3 3 00 

AGAACTCTTT TCCTTTCATC TCTTTTTTAG AGAAATAGAT ACCTGCAACC AACACCGCAA 33 60 

GTAAATAAAC AATCAAGATA ATTAAGTCAA TTATTGTAAA TCCTGTTGTG CCCATAACAT 342 0 

ATCTCCATAT TGATTTTATT TAT T AT AAAA ATTCTTTTCG TGCTTGTTGA ATAAGTTCTG 3480 

CTGCTTGTTT TGCAACTTCC AAGTCACCTT CTGCCAATGC TTCTAAAGGT TGACGAACAG 3 540 

AACCTAAATC AAGTTTTTCA TTTAGACGCA AAACTTCTTT TGCTACAGCA T AC AT ATT T G 3 600 

CCTTACCTGA TATCATCTTA T AG AT AAC T T CATTGATAGC ATATTGAAGT TTTTTAGCTG 3 660 

TATCTAAATC TCGTTCTTGA ATCAAACTTT CCAATTTCAA G AAC AAAT CT GGCATAACGC 372 0 

CATAAGTACC ACCAATACCA GCTTCTGCTC CCATCAAGCG ACCACCAAGA TATTGTTCAT 3 7 80 

CTGGACCATT GAATACAATG TAATCTTCTC CACCTGCAGC TACAAACATT TGAATATCTT 3 840 

GTACAGGCAT AGAAGAATTT TTAACTCCAA TCACACGAGG ATTTTGACGC ATTGTTGCAT 3 900 

ACAAACTACC AGTCAACGCA ACCCCTGCCA ATTGTGGAAT ATTATAGATA ATAAAATCTG 3 960 

TATTTGACGC AGCTTCACTC ATTGCATTCC AATATGCTGC GATTGAATAC TCTGGCAATT 4020 

TGAAATAAAT AGGTGGGATA GCTGCAATAG CATCGACTCC AACACTTTCT GAATGTTTTG 4 080 

CCAATTCGAT ACTATCTTTC GTGTTATTAC ATGCAATATG GT T GAT AAC T GTTAATTTAC 4140 

CTTTAGCAAC TTCCATAACA GCTTCAATAA TTTGTTTACG ATCTTCTACA CTTTGGTAAA 4200 

TACATTCACC TGAAGAAC C A TT T AC AT AG A TACCTTTTAC ACCTTTGTCA ATGAAATATT 42 60 

GT AC C AG AG A TTTTACACGA TCTTGGCTAA TTTCACCATT TT C AT CAT AG CAAGCATAAA 4 320 

ATGCAGGGAT AACGCCTTTG TATTTAGTTA AATCTTTCAT CAGATTTCTC CTTTATATTG 4380 

TTTTTTATTT GATGACATTA AT AAAT C GCT GAGCAATTTC TTTTGGACGT GTAATCGCTC 4440 
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CACCAATGAC TACACTGGTA ACACCTAAAC TATAAGCTTT TTTTAATTGT TCTGGATAAT 4 500 

GAATTTTTCt TCGGCAATTA CCGGAATATT AAAATCAGCC AATTTTTTCA TTAGTTCAAA 45 60 

ATCAGGCTCA TCTGATTGTA CACTTGTACT TGTGTAACCT GATAATGTTG TACCAACAAA 4 62 0 

ATCAACGCCT GATTTAAATG C AT AGAG AC C TTCATCTAAA TTACTTACAT CCGCCATCAG 4 680 

CAATTGATTC GGATATTTTT CTTTTATTTT TTTGATAAAT TCACTGACAA CTAAGCCATC 474 0 

ATATCTTGGT CTTAAAGTTG CAT C AAATGC AATGACTGTT GTTCCGCATT CTACAAGTTC 4 800 

ATCTACTTCT TTCATCGTAG CAGTAATATA TGGTTCTTGA GGTGGATAAT CCCTTTTGAT 4 8 60 

AATTCCAATT ATTGGTAAAT CTACTACTTT CTGAATTGCT TTAATATCAC GCACAGAATT 4 92 0 

TGCGCGAATG CCCACTGCTC CTGCCTCTAA AGCTGCTTTA GCCATAAAAG GCATCAAGCT 4 9 80 

AAATTCTTCA TTATAAAGGG CTTCACCAGG TAAAGCTTGA CAAGAAACAA TGACTCCACC 5 04 0 

TTGAACTTGG CTTATAAATT TTTCTTTAGT CCAAATTTGG CTCATTTTAT TATTCCTCCT 5100 

TATGGATAAT AGTTTGATTG TAATAATATT GTCTCTCTGG ACTTTCCAGA TAATTAGAGA 5160 

ATAAGCAGTC TGTAATTAAA AGTATTGGAA ACTGAGGTGA TATGCGATTG C CAT AC GAG A 522 0 

GATGATCGGT CGAAGCTAAT AACAATAGTT CATCAAAGAA ACAATCTTCT TCGTCAAATT 52 8 0 

TTCTTGTAGT CATTAAAACT GTTTTAGCGC CTTTATCTGC AGCTTTTTGT AGACCTTCTA 534 0 

GTACAATATC AGTTTGACCT G AAATGG AT G CTCCAATGAC AAGGCAATTT TCATTAAGTA 54 00 

GTAAGCTACT CCACAAAATC ATATCCTCGT CTGATAATAC TTCACCAATC ACTCCGAGAC 54 6 0 

GCATAAATCT CATCTTCATT TCTTGTAAAG CAAGAACAGA ACTTCCTTTA C CGT AG AG AT 5520 

ATACACGCTC AGCAGTTTCT ATCATCTCAG CAATACGCTC AAGTTGAACT TCATCAAGAA 5 580 

CCGTGTAAGT TTTTCTCAAC ATTTCCTCAT AGTCGGATAA AACTTTTTCT GTTGCCTCTG 5 640 

TATATAATGC CAACTTTTCT TTCTCATGAA TCATCTCTTG GTATTTGAAA ATGAATTGTC 57 0 0 

TAAAACCTTT AAAAC C AC AT TTTTTCGCAA ATCGAGTCAA TGTTGCTTTG GATACATTAA 57 60 

GGTATTCGCA CAATGCTTTA GATGAATAAT CATTCAGAGG TTGCTGTTTT AAGAAGAATT 5820 

TAGCAATGTC TTTTTCAGCA TATGCCATAT TTGGTAAGTT AGCTTCTATC ATTGGAATTA 5 880 

GTTCTTTTTG CAGTAACATA TGAGCTCCTT AGTTGAAGTA AAC GTTT AC A TTCTTTATTT 5940 

TAACACTTTT TTTTTTTTTC AATATTTTTC ATAAATTAGA AACTAGTTTC CAATTTCTTT 6000 

CGTTTCATAA CAGAACAACA AACATAAAAA TATAATAGTT TTTATTCTTT TTATCGTAAT 6060 

TATATGTATT GTAAGAACGT TTATCACTAA TAATATGTTC ATATTAAAAT ATTTTAGTAA 612 0 

TATTTTATTT TGGTTTTATT ATTTCTTTTC GGAATTTCTA TATAATATTT TATTTCTAAA 6180 
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AAAATTGAAA AAATATTTCT AGTTTCTTTA TTTTATATAG GTAATATATT TTATTTCTAA 624 0 

ATTAAAAGAG AATCCCATAA AAACTACAGA TT T AT GAG AT AAATCAGGTC ACCTATTTTA 6300 

AAAAAGCAGC AAACTATAAA CTAAAAAGTT CCACACCAAA TGTAACCCCA TACTTCCCCA 6 3 60 

TAAGTCAGAT TTATAGCGCA CCATACCTAA AAACATTCCA AGTGAAACGT ACAGACACCA 6420 

AGCTAGAATG GTTCCTGGAT GATGTACTAA GGCAAATAAA ACACTTGTCA AAGCAACTCG 6480 

AATATCTAAT TTTCTAACCA AGTTCCATAA AATTTCACGA TACAGAAATT CTTCAACCAT 654 0 

ACTCGCATTG ATTAAGAACA ATAAAAATGA AAACCAAGGA ACTTGATGTT GAAGGCCAAT 6600 

TAAATTTGTT TGATTCGTGC TTCCTTGAGC ATGAATCAGG CTAAAACATA GACTTATAAT 6 6 60 

CAGTAGACTA GCTAGTCCAA TACCAAGGCA TTTCATCCTA GTTTTCATAT TGACCTTGAC 672 0 

CACTTGTTTT CGTTGACCAT ACATCCATAA AAAAGAAAAA AG AG AC GC AC CATAGAGAAC 67 8 0 

CTGTAGTATA GTTAACTCAC CGATACAAAG AAATTTCAAT AAGTATAGAG ATACCAATAG 6840 

GACATTTACT TGTTGGAATA TATAAACTGG AATTATTCTT TTCATAGTTA CCTCCGAAAT 6900 

AAATCTTCAT AATCTAAATC TAATATCTGC ACAATCCTTT CTACCCATGG ACTTTGAGGC 69 60 

ATTCGTTGTT CCATCTTGTA GTGGCGAATC TTTTGATATA AACGATTCAA TTCACTTGGA 7 02 0 

TAGTGAAACT CTCCCGCAAA CATTTTTCTG GTTAACTCAA TCCAGCTGAT ATTTCTTTCA 7080 

GCCAAAATAA TGGACAAGTT CTCCCAAAAT CGTTCAGCCA TATTrCTTCT CCTTTAGTTA 714 0 

GATAAATAAT GTGTTTGyGC CATGTAAATC AATTGTTTCG TATCTCTTGG CAATAGAGCT 7 2 00 

CTAGCCTCTT CCAAATTCAG ACTTGGATAA ACCCGCTTAT TTGAAACCAC AAAAGGAAGT 72 60 

CCGATGGTTA GTTCAGGATT TTTTAAAATT ATCTCAACGA AATCCGTTAA TCTTAGATTG 732 0 

TCACGGTTCT TAAATCGTAA TAAATTGGGA GATAAAAACT CAAAACAATC TGAAGAATAG 7 3 80 

CTCATCATCT CAATTAATTT GTCCTTTGTC ATTTCAGAAA CTGAATGACA AG AT AC C T C A 744 0 

ATGCCATAGT TTTGGAAGAA GTCTAAAAGA AGTTGATTTC TTTGGCTATT TTTACTTAGA 7 500 

TAGAGATCAA T C ATGGG AG A CCTCCAACAA ATTTGCTTCC ATTTGATATT CTGAGACGAT 7560 

TAAGGAATCT AACAACTTTG AGAAGTTAAT CGATTTCTTG TCTTCATCAT AAGCTTTTAC 7 62 0 

AGTTACTTGG GTTGTAAGTA TCCCCTCTTT TCCCTCGGCT CGATAGTCTT GTCAATATAA 7 680 

AACAAAAACA AGATTCTGAT TATCATCTAC AAAGGCATTA ACTCCGTTCT TTATATCCTG 7740 

ACTTTCAAGG AATTCCATAA CGTTTTGAAG AT AGG ATT C A TAAAATAGTG GGTAATTATG 7 800 

TTTTTTATGG TAATCATCTA AAAATGTTAC CTCAAACTCA CATGGATAAT TGGGCATCAA 7 8 60 

AAATATTTGT TCATCCAGCT GTTTGATTTC TGCATCATGT AATTCTGTTT CTAATTCATC 7 920 

ACAATCTAGT ATTGATTCTT TATTTAATGC TTTTATCTTT TTCCTCTATT TCTTTTAATT 7 980 
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TCTTTGCGAT TGCGGCAATC ACAGGAACGG T T AC AC TAT T ACCAACTTGT TTATAGAGCT 8040 

GACTATTAAT AGAGACTTTT CTAGCAGCTT CAAAAGCCTA ATCAGGAAAG CCATGCAATC 8100 

GAAAACACTC TTTAGGAGTG ATTCGTCGTA TTCTCAAACG GTAAAATTGT C C AT C T ATT A 8160 

AAACACCAGC TACTTGGTAA ACTTGTTTAT CTTCTCCTTC ATAGCTAGCC AC TACT AC TC 8220 

CCATTTGACC ACTAGTTGTT AACGTATTAG CTATACCTTT TCCAACTCTA CCACGACGAT 82 80 

ACTGAGAACT TGGTCTTTCT AAATTGATTG AATCCCCAAT CTCTGCTTGA GC AT ATCCTT 8340 

TTTTCGTTGC TTCCCGTACT TTTAGAAATT GGATTGGTTC TGGAATTAGT ATTTTGGGGA 8400 

TTTTATCTCC TCCTTGCATC GTAGTCAGTG TTGGAGATAA GCCCTCACTT C CAT AG AC AC 84 60 

GACCTGTCTC CTTAAAGCTA GTCGGTAAAT CTCCAACAAC GACAATGCCA TAACGATCCT 852 0 

GAGTATTTAA AGTAAACATC GGCTCTTGAT TTTCCTTAAA GCGTCTCCCA TTTTGTCTCT 85 80 

TGTCTAATCT ATCTGGTGTC ATACAAGGAA TCGCAACTTT AAATCCTTCT CCTTTACCAC 8 640 

GAACTAAGGT TGGCGCAAGA CCTTCTGAAT AATAGACTTT ACCGCTCATT CCACTTCTTG 8700 

ATGGATTCAA ATTTCCTAGT GCTTTCAAAG TCTCAGAGTT AGTTGCTTGA CCTTCTCGTC 87 60 

TGAAAGGAAA TAAGAGTCTG GTACCTTTCT TTCTAGAATG TCCGATAATA AACACCCTCT 8 820 

CTCTGTTTTT GGGAACGCCA AAATCCTTAC TGTTAAGCAC CTGCCACTCA ACATCAAACC 8880 

CCAACTCATC AAGTGTGGTA AGTATTGTGG TGAACGTCCG TCCCTTATCG TGATTGAGTA 8940 

GGCCTTTAAC ATTTTCAAGA AAAAGAAAAC GTGGTTGGAT TTGTTTGGCC GCCCGAGCAA 9000 

TTTCAAAGAA CAAAGTTCCT CTAGTATCTT CAAATCCCAA TCGTCTTCCT GCGATTGAAA 9060 

ATGCTTGACA AGGGAATCCC CCACAGATGA C AT CG ACT T T CCCTCTAAGT TTTTTAAATT 9120 

CGTCATCTGA AACATCTCGT ATGTCATGAA ATTCTATTTC TCCTTCCGTT TGAAAAATGG 9180 

ACTTATAAGA TTTCCTAGCA AATTTATCAA TCTCACAAAA TCCCAAGCAC TCATGCCCTT 9240 

GAGCTTCCAT TCCCATCCTA AAGCCTCCTA TCCCAGCAAA TAAATCTAAA ACCCAAATCA 9 3 00 

TTCATACCTC TCTCAACTAG ATGTAACTTA CAAAACCCCT GACCTCATGA GCCACTTTCT 93 60 

TCCTCCTCAT GAGGTCAGTT TTACTTTCTG CTGTTCCAGT ATCGTTTTTC CTCGCTAGAT 9420 

TTCCTCAAAA GGGCAGACTC CTCCCTTGGT TCGTCACACG ATTTTTTCAT CTCGACTGTT 94 80 

CTTTAATGCA T CAT T AAC G A CGCTTTTCTT CTAGGTGGTT CATAAGGAAC AGGAAGATTC 9540 

AGGTTGACTT TTCTAATCCT AGAATAAAGT GCTGAAAACA AT TCGG AAT A GG CAT AG AG A 9600 

CTAGACAATT TGAGGAGCTG CTTGCGTCCT GTTCGAACAC ATTTTCCTAC CACGTGAAGA 9660 

AAAAG AT GGC GGAAGCGTTT GATTGTTAAA GTTTGGAAGT CACCTCCAGC TAGATGTTTG 9720 
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AGAAAAAGAT AGAGATTGTA GGCGATACAG CTCATCATCA TACGAACTCG TTTTTGATTA 9780 

AGGTTGAACT ATCCGTTTTA TCGCCAAAAA ATCCCTCCTT CATCTCCTTG ATGAAATTCT 9840 

CGGCTTGACC ACGTCCACGA TAAAGCTGAA ACTGGTCTTG GCTTGTTCCG GTACCGA 98 97 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 8148 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 11: 

CCGTGGAACA AGCCAAGACC AGTTTCAGCT TTATCGTGGA CGTGGTCAAG CCGAGAATTT 60 

CATCAAGGAG ATGAAGGAGG GATTTTTTGG CGATAAAACG GATAGTTCAA CCTTAATCAA 12 0 

AAACGAAGTT CGTATGATGA TGAGCTGTAT CGCCTACAAT CTCTATCTTT TTCTCAAACA 180 

TCTAGCTGGA GGTGACTTCC AAACTTTAAC AATCAAACGC TTCCGCCATC TTTTTCTTCA 24 0 

CGTGGTAGGA AAATGTGTTC GAACAGGACG CAAGCAGCTC CTCAAATTGT CTAGTCTCTA 300 

TGCCTATTCC GAATTGTTTT CAGCACTTTA TTCTAGGATT AGAAAAGTCA ACCTGAATCT 3 60 

TCCTGTTCCT TATGAACCAC CTAGAAGAAA AGCGTCGTTA ATGATGCATT AAAGAACAGT 42 0 

CGAGATGAAA AAATCGTGTG ACGAACCAAG GGAGGAGTCT GCCCTTTTGA GG AAAT C TAG 4 80 

CGAGGAAAAA CGATACTGGA ACAGCAGAAA GTAAAACTGA CCTCATGAGG AGGAAGAAAG 540 

TGGCTCATGA GGTCAGGGGT TTTGTAAGTT ACATCTAGTT GAGAGAGGTA TGAATGATTT 6 00 

GGGTAAATAC AATGAGCTTG AAAGAAGTAG CAAACTCACC AAGCGCCAAT TCTTTGAGAA 660 

TCAGATGCTG GATTATACCA TCATTGCGCA TGAGAGTTTT GAAATCATCC GTCATTCTGT 72 0 

C T AC C AG AC A GATGATCGTG AAGTGGAAAA TGCTCTGGCT TTTGAAGTGA AAAATGATGA 7 80 

AACAGACAAG CTGATTCTGT TATTAAGCGA GGATATTGGT GTAGGTGAAA AATTGTGCCT 84 0 

CGTTGACGGA ACAAAAATGC GTGGAAAATG TTTAGTATAT GATAAAATAA ATGAGAGAAT 9 00 

GATTCGCTTG CAGTGCTAGA AATAGGCATT TTGAATAGTG AATATGTTAT AATAAGTATT 9 60 

AGTAGGAGGT GTTTT AG AT T GGAGAAGAAA CTGACCATAA AAGACATTGC GGAAATGGCT 102 0 

CAGACCTCGA AAACAACCGT GTCATTTTAC CTAAACGGGA AATATGAAAA AATGTCCCAA 1080 

GAGACACGTG AAAAGATTGA AAAAGTTATT CATGAAACAA AT T AC AAAC C GAGCATTGTT 1140 

GCGCGTAGCT TAAACTCCAA ACGAACAAAA TTAATCGGTG TTTTGATTGG T G AT ATT ACC 12 00 

AACAGTTTCT CAAACCAAAT TGTTAAGGGA ATTGAGGATA TCGCCAGCCA GAATGGCTAC 12 6 0 
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CAGGTAATGA TAGGAAATAG TAATTACAGC CAAGAGAGTG AGGACCGGTA TATTGAAAGC 132 0 

ATGCTTCTCT TGGGAGTAGA CGGCTTTATT ATTCAGCCGA CCTCTAATTT CCGAAAATAT 13 80 

TCTCGTATCA TCGATGAGAA AAAGAAGAAA ATGGTCTTTT TTGATAGTCA GCTCTATGAA 144 0 

C AC CGG ACT A GCTGGGTTAA AACCAATAAC TATGATGCCG TTTATGACAT GACCCAGTCC 1500 

TGTATCGAAA AAGGTTATGA ACATTTTCTC TT G ATT AC AG CGGATACGAG TCGTTTGAGT 1560 

ACTCGGATTG AGCGGGCAAG TGGTTTTGTG GATGCTTTAA CAGATGCTAA TATGCGTCAC 162 0 

GCCAGTCTAA CCATTGAAGA TAAGCATACG AATTTGGAAC AAATTAAGGA ATTTTTACAA 168 0 

AAAGAAATCG ATCCCGATGA AAAAACTCTG GT AT TT AT C C CTAACTGTTG GGCCCTACCT 1740 

CTAGTCTTTA CCGTTATCAA AGAGTTGAAT TATAACTTGC CACAAGTTGG GTTGATTGGT 1800 

TTTGACAATA CGGAGTGGAC TTGCTTTTCT TCTCCAAGTG TTTCGACGCT GGTTCAGCCC 18 6 0 

TCCTTTGAGG AAGGACAACA GGCTACAAAG ATTTTGATTG ACCAGATTGA AGGTCGCAAT 192 0 

CAAGAAGAAA GGCAACAAGT CTTGGATTGT AGTGTGAATT GGAAAGAGTC GACTTTCTAA 19 80 

AATGAAGGAA AATGACTTGC AATCTCTGTT AAGAAATAAA ATAATCCCAC CTAGAACAAG 2 04 0 

CTAGGTGGGA TTATTTGCCT ATGAAATGAG AAATTATGGG AGCAAGCTCC TAAATCAACT 2100 

GTTTTTGATC TACTTCTTTA ACTACTTGAT AAAAG TT AT A GAAGTAGGCC AAACTTGAAA 2160 

TGATGGTTAC GACTAGGAAT ATTGAAAATT TCCATTGGAC AGGGTTGGTT AAAAG TTGTG 2220 

GAAAGGATAT GAGGAGAAAG AAGAGGGCTG CGTTGAGGAC AGGTATCCGT TTTGATTGTA 22 80 

TTTTCTCAAG TCCTTTATTG AGCGCAGGAA GAAAGAGGAG TAGGAGTAGT AAAACTGTAT 2 3 40 

G AG AAAT AG C TCCTGAAGTA AGGGCGAAGA AAAGGAAAAT ACTGATAAAA AC AT G AATG A 2 4 00 

TCAGTAGTCT AGCTAGTGAT TTCATAAGGC ACCTCCTAAT CCTGGTCTTT TTTAGCTCTT 24 6 0 

GCAATACGAA GTGAGTCGAC AATATGTATC ATCACTCCGA AAAAG AAAGC TCCCAGTATA 252 0 

GTTTTAAAAA TATGTTTTGT ATTTAGAAGA GAACTGATAA AATTTGGATT TTCACTTGTT 2 5 80 

AGGGTATCAA TGAGTGGAAT TATAAAAAAT ATCACTGTTC CATAAATCGA ACCTGCTTTC 2 640 

AGACCAGGAT AACGTAACTG TTTCTTTTCT TTTTTCATGA GTTTCCTCCT AATCCTCATC 2 7 00 

TTGATTTTTC TTAGTTTTTG CAATGCGACG GGAGATGAGG AACTGTATGC TCGCTCCGAA 27 6 0 

GAAAATAGAA CCGAGAATAC TTGATAC AC C ATTTCTTATA GTGAGAAGAG AATGAAAATA 2 820 

GTCCTGACCT TCATCTATGA GTATCCTGAG AAGAGGAGTT ATAAAAAACA TCCATAGACC 2 8 80 

AAAGAACAAA CCTGCTTTCA GACCTGGGTA GTGTAGTTGC TTGCTTTCTT T C T C ATT C AG 2 94 0 

CATATCTGGT T C AATG AC T G TGATGCCTGT TTTTTTCATT TGGTAGGTGA CATAGCCAGA 3 000 
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AGCGATGAGG GCAATCACTA AAATCAGAGG AGGATAGATT AG AG C C AC T T CTTGAGGGTA 3 0 60 

TTTATAGGCC AGAAGGAGTG GAATAAGATT TCCGAAAATC AT C AG AT AAA AGAGGATGAT 312 0 

AAAGACTTGG TTCCCAATAC TATCGGCCTC ACGCCGTTTG TATTCGTCAA GGGGACCAGA 3180 

AATACCGTAT GTGCGTTTGA TCAGTTTTTC AGTGAAGGTT TCTTTTTTCA TGAGTTTGCT 3240 

CCTTTTTTAA AAATCTTCCT CCCAAAAGAG ACTGTTGAGG TCAGTTTGGA GGCTGCGGGC 3 3 00 

GAGATTGAGA CAGAGTTCCA AGGTTGGATT GTACTTGTCG TTTTCAATCA TATTGATAGT 33 60 

CTGTCTCGAG ACACCGATAT CCTTGGCGAG TTCGAGCTGG GAAATACCCA ATTCCTTGCG 3 42 0 

AAATTCTTTC ACACGATTCA TCTGTTCTCC TTTCTGATTT ATGTCGTATA TATTTGACTA 3480 

TATTATAGTC TTTTAAACAT AAAGTGTCAA GTATTTTTGA CATATTTTTT GAAGAAATAG 3 54 0 

TAGTCTCCTT GTCCTATTTG TCTGACAAGT GCAAGCTGGT CGGATTTGTG GTAAAATAGA 3 600 

TAAGATATGA CAAAAGAATT TCATCATGTA ACGGTCTTAC TCCACGAAAC GATTGATATG 3 660 

CTTGACGTAA AGCCTGATGG TAT C T AC GT T GATGCGACTT TGGGCGGAGC AGGACATAGC 3 72 0 

GAGTATTTAT TAAGTAAATT AAGTGAAAAA GGCCATCTCT ATGCCTTTGA CCAGGATCAG 3 780 

AATGCCATTG ACAATGCGCA AAAACGCTTG GCACCTTACA TTGAGAAGGG AATGGTGACC 3 840 

TTTATCAAGG ACAACTTCCG TCATTTACAG GCATGTTTGC GCGAAGCTGG TGTTCAGGAA 3 900 

ATTGATGGAA TTTGTTATGA CTTGGGAGTG TCTAGTCCTC AAT TAG AC C A GCGTGAGCGT 3 960 

GGTTTTTCTT ATAAAAAGGA TGCGCCACTG GACATGCGGA TGAATCAGGA TGCTAGCCTG 4 020 

AC AG CCT ATG AAGTGGTGAA CAATTATGAC TATCATGACT TGGTTCGTAT TTTCTTCAAG 408 0 

TATGGAGAGG ACAAATTCTC TAAACAGATT GCGCGTAAGA TTGAGCAAGC GCGTGAAGTG 4140 

AAGCCGATTG AGACAACGAC TGAGTTAGCA G AG ATT AT C A AGTTGGTCAA ACCTGCCAAG 42 00 

GAACTCAAGA AGAAGGGGCA TCCTGCTAAG CAGATTTTCC AGGCTATTCG AATTGAAGTC 42 60 

AATGATGAAC TGGGAGCGGC AGATGAGTCC ATCCAGCAGG CTATGGATAT GTTGGCTCTG 4 32 0 

GATGGTAGAA TTTCAGTGAT TACCTTTCAT TCCTTAGAAG ACCGCTTGAC CAAGCAATTG 43 80 

TTCAAGGAAG CTTCAACAGT TGAAGTTCCA AAAGGCTTGC CTTTCATCCC AGATGATCTC 4440 

AAGCCCAAGA TGGAATTGGT GTCCCGTAAG CCAATCTTGC CAAGTGCGGA AGAGTTAGAA 4500 

GCCAATAACC GCTCGCACTC AGCCAAGTTG CGCGTGGTCA GAAAAATTCA CAAGTAAGAG 4560 

GGAAAAAGAT GGCAGAAAAA ATGGAAAAAA CAGGTCAAAT ACTACAGATG CAACTTAAAC 4 62 0 

GGTTTTCGCG TGTGGAAAAA GCTTTTTACT TTTCCATTGC TGTAACCACT CTTATTGTAG 4680 

CCATTAGTAT TATTTTTATG CAGACCAAGC TCTTGCAAGT GCAGAATGAT TTGACAAAAA 474 0 

TCAATGCGCA GATAGAGGAA AAGAAGACCG AATTGGACGA TGCCAAGCAA GAGGTCAATG 4800 
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AACTATTACG TGCAGAACGT TTGAAAGAAA TTGCCAATTC ACACGATTTG CAATTAAACA 4 860 

ATGAAAATAT TAGAATAGCG GAGTAAGATA TGAAGTGGAC AAAAAGAGTA ATCCGTTATG 4920 

CGACCAAAAA TCGGAAATCG CCGGCTGAAA AC AG AC GC AG AGTTGGAAAA AGTCTGAGTT 4 980 

TATTATCTGT CTTTGTTTTT GCCATTTTTT TAGTCAATTT TGCGGTCATT ATTGGGACAG 5 040 

GCACTCGCTT TGGAACAGAT TTAGCGAAGG AAGCTAAGAA GGTTCATCAA ACCACCCGTA 5100 

CAGTTCCTGC CAAACGTGGG ACT ATT T ATG AC CG AAATGG AGTCCCGATT GCTGAGGATG 5160 

CAACCTCCTA TAATGTCTAT GCGGTCATTG ATGAGAACTA TAAGTCAGCA ACGGGTAAGA 522 0 

TTCTTTACGT AGAAAAAACA CAATTTAACA AGGTTGCAGA GGTCTTTCAT AAGTATCTGG 52 80 

ACATGGAAGA ATCCTATGTA AGAGAGCAAC TCTCGCAACC TAATCTCAAG CAAGTTTCCT 5340 

TTGGAGCAAA GGGAAATGGG ATTACCTATG CCAATATGAT GTCTATCAAA AAAGAAT TGG 5400 

AAGCTGCAGA GGTCAAGGGG ATTGATTTTA CAACCAGTCC CAATCGTAGT TACCCAAACG 54 60 

GACAATTTGC TTCTAGTTTT ATCGGTCTAG CTCAGCTCCA TGAAAATGAA GATGGAAGCA 5 52 0 

AGAGCTTGCT GGGAACCTCT GGAATGGAGA GTTCCTTGAA CAGTATTCTT GCAGGGACAG 5 580 

ACGGCATTAT TACCTATGAA AAGGATCGTC TGGGTAATAT TGTACCCGGA ACAGAACAAG 5 640 

TTTCCCAACG AACGATGGAC GGTAAGGATG TTTATACAAC CATTTCCAGC CCCCTCCAGT 5700 

CCTTTATGGA AACCCAGATG GATGCTTTTC AAGAGAAGGT AAAAGGAAAG TACATGACAG 57 60 

CGACTTTGGT CAGTGCTAAA ACAGGGGAAA TTCTGGCAAC AACGCAACGA CCGACCTTTG 582 0 

ATG C AG AT AC AAAAGAAGGC ATTACAGAGG ACTTTGTTTG GCGTGATATC CTTTACCAAA 5880 

GTAACTATGA GCCAGGTTCC ACTATGAAAG TGATGATGTT GGCTGCTGCT ATTGATAATA 5940 

ATACCTTTCC AGGAGGAGAA GTCTTTAATA GTAGTGAGTT AAAAATTGCA GATGCCACGA 6000 

TTCGAGATTG GGACGTTAAT GAAGGATTGA CTGGTGGCAG AACGATGACT TTTTCTCAAG 6060 

GTTTTGCACA CTCAAGTAAC GTTGGGATGA CCCTCCTTGA GC AAAAGATG GGAGATGCTA 612 0 

CCTGGCTTGA TTATCTTAAT CGTTTTAAAT TTGGAGTTCC GACCCGTTTC GGTTTGACGG 6180 

ATGAGTATGC TGGTCAGCTT CCTGCGGATA ATATTGTCAA CATTGCGCAA AGCTCATTTG 6240 

GACAAGGGAT TTCAGTGACC CAGACGCAAA TGATTCGTGC CTTTACAGCT ATTGCTAATG 6300 

ACGGTGTCAT GCTGGAGCCT AAATTTATTA GTGCCATTTA TGATCCAAAT GATCAAACTG 6360 

CTCGGAAATC TCAAAAAGAA ATTGTGGGAA ATCCTGTTTC TAAAGATGCA GCTAGTCTAA 6420 

CTCGGACTAA CATGGTTTTG GTAGGGACGG ATCCGGTTTA TGGAACCATG TATAACCACA 6480 

GCACAGGCAA GCCAACTGTA ACTGTTCCTG GGCAAAATGT AGCCCTCAAG TCTGGTACGG 6540 
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CTCAGATTGC T G ACG AG AAA AATGGTGGTT ATCTAGTCGG GTTAACCGAC TATATTTTCT 6 600 

CGGCTGTATC GATGAGTCCG GCTGAAAATC CTGATTTTAT CTTGTATGTG ACGGTCCAAC 6 6 60 

AACCTGAACA TTATTCAGGT ATTCAGTTGG GAGAATTTGC CAATCCTATC TTGGAGCGGG 672 0 

CTTCAGCTAT G AAAG ACT CT CTCAATCTTC AAACAACAGC TAAGGCTTTA GAGCAAGTAA 6780 

GTCAACAAAG TCCTTATCCT ATGCCTAGTG TCAAGGATAT TTCACCTGGT GATTTAGCAG 6840 

AAGAATTGCG TCGCAATCTT GTACAACCCA TCGTTGTGGG AACAGGAACG AAGATTAAAA 6900 

ACAGTTCTGC TGAAGAAGGG AAGAATCTTG CCCCGAACCA GCAAGTCCTT ATCTTATCTG 6960 

ATAAAGCAGA GGAGGTTCCA GATATGTATG GTTGGACAAA GGAGACTGCT GAGACCCTTG 7 02 0 

CTAAGTGGCT CAATATAGAA CTTGAATTTC AAGGTTCGGG CTCTACTGTG CAGAAGCAAG 7 080 

ATGTTCGTGC TAACACAGCT AT C AAGG AC A TTAAAAAAAT TACATTAACT TTAGGAGACT 714 0 

AATATGTTTA TTTCCATCAG TGCTGGAATT GTGACATTTT TACTAACTTT AGTAGAAATT 72 00 

CCGGCCTTTA TCCAATTTTA TAGAAAGGCG CAAATTACAG GCCAGCAGAT GCATGAGGAT 72 60 

GTCAAACAGC ATCAGGCAAA AGCTGGGACT CCTACAATGG GAGGTTTGGT TTTCTTGATT 7320 

ACTTCTGTTT TGGTTGCTTT CTTTTTCGCC CTATTTAGTA GCCAATTCAG CAATAATGTG 738 0 

GGAATGATTT TGTTCATCTT GGT CTTGT AT GGCTTGGTCG GATTTTTAGA TGACTTTCTC 7440 

AAGGTCTTTC GTAAAATCAA TGAGGGGCTT AATCCTAAGC AAAAATT AG C TCTTCAGCTT 7500 

CTAGGTGGAG TTATCTTCTA TCTTTTCTAT GAGCGCGGTG GCGATATCCT GTCTGTCTTT 756 0 

GGTTATCCAG TTCATTTGGG ATTTTTCTAT ATTTTCTTCG CTCTTTTCTG GCTAGTCGGT 7620 

TTTTCAAACG CAGTAAACTT GACAGACGGT GTTGACGGTT TAGCTAGTAT TTCCGTTGTG 7 680 

ATTAGTTTGT CTGCCTATGG AGTTATTGCC TATGTGCAAG GTCAGATGGA TATTCTTCTA 774 0 

GTGATTCTTG CCATGATTGG TGGTTTGCTC GGTTTCTTCA TCTTTAACCA TAAGCCTGCC 7 800 

AAGGTCTTTA TGGGTGATGT GGGAAGTTTG GCCCTAGGTG GGATGCTGGC AGCTATCTCT 7 860 

ATGGCTCTCC ACCAAGAATG GACTCTCTTG ATTATCGGAA TTGTGTATGT TTTTGAAACA 7 92 0 

ACTTCTGTTA TGATGCAAGT CAGTTATTTC AAAC T G AC AG GTGGTAAACG TATTTTCCGT 7980 

ATGACGCCTG TACATCACCA TTTTGAGCTT GGGGGATTGT CTGGTAAAGG AAATCCTTGG 804 0 

AGCGAGTGGA AGGTTGACTT CTTCTTTTGG GGAGTGGGAC TTCTAGCAAG TCTCCTGACC 8100 

CTAGCAATTT TATATTTGAT GTAAGAATGG CACCCTGATG TTTCAGGG 8148 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9909 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TACTCCACCC TTAATATCCG TTCCTGTAAA TACTTTACCG CTTTTAAGTT CATAGAATTG 60 

AACTTTTAAA TGCTTGTCTT CAAGCATCTT TTCCATCCAA TTTTTAGGAG TTTGACCAGC 12 0 

TTTAAATAAA AACCTTGCTG GGGTGATTAG TATAGATTTA TCTGCGATTT TATAAGCTTC 180 

AT C AATAAAA TAGTGATATA TCGGCTCATC TCTGGCTTCT CCTGTTTCCT GATACGGAGG 240 

ATTTCCTATC ACGACATCAA ATTTCATTTC ACTTTCCTCG CTAGATAGGC GCTCAAAACC 3 00 

TATCATTCTA TTCTTTTTCC AGTCTTTGAT ATGGGTTTTA GATTCTTCTA CTTCTTGGAC 3 60 

TTCTAGCTCA TCCGCAAACA AACTCAATTG TTGAGATTGC TTTTGTTTAG CTGAATAAGG 42 0 

ACTACTTTTT TTCAATCCAT CCATCTGAAA GACATTGTAA GAGATAATAG TCGCAATTTC 4 80 

TTTCTTTTGC TCTAATGTTG GTTGATTTCC AGTCTTAGCT AGATAATAGT CCTCAAAAGT 54 0 

TGCCAAAAGA TTCTCACGCG C C AAAAGG AG AGAATCTCCT TG AT AC TC AT AAC CAT AC G A 600 

AGCATGATAA GCATCTTTTA CAAGTTTATA AAATGTGACT TCATCTGAAA CCTCACGACT 660 

AATCCGTTGC AGTTTTCTAT CAACAAAACC AACTCGCTCA GATAATGGAA TTTCCTCACC 72 0 

AGTTACGGTA TCATATCTCG TTACCATATA AGGTGCTTCA C C AC AAGTT A CCTCTAACCA 780 

TCGTAAGTCC ACATACTCCT CAAGACTTAA CGAGCCTAAT TTCGATTCTA CATATCCATT 84 0 

TTGCTTTGCG ACCAACCACG TTGGTGTAAA CACTTCTGCC CTTATTTTTG TCCGATCTTT 90 0 

TTGTTCATAT TTGGATTTTT C AG AT CTGGG CTGAATCAAG TTGGCAAAGT TTCCAGTAAC 960 

CTTACTTGGA TTGATGCGAT CACTTGGAGC AAATCCCTTT CCTAACAATT CATAAGAATG 102 0 

CGTAnGCCAA ACAATTGATT TCTTTGTCGT TCGATCTTTT AAAAGAATTT TTAATAAGTC 1080 

AGCCGATTCT TTAGCCAAAC TTTCTTCACT AATATCTATT GTCATCAGCA ACCTCTCTTA 114 0 

TATTGTAAGC CCTATTATAT CATATTTTAA AGAATGAAAA TTTACTTGAA AAAAGTAATT 12 00 

CAATAAATAT CTCTCCGATG ACCAACTTCT AGAGTAGCAA CGACTAATTC AT C ATCT AC A 12 60 

ATTTGTACGA TAACTCGATA ATTACCAATT CTATAGCGCC ATTGACCAAC GCGATTACCA 1320 

ACCAAAGCCT TTCCGTGTCG TCTTGGGTCT TCCAAAACAT TGGTTTGTAA ATAGTTTGTA 1380 

ATTAGCTTCT GCGTATAACG GTCCAATTTT TTCAATTGCT TGATAAAACG TCTTGTTGGA 1440 

ACTAATTTAT ACAAATTATT CATCCTTCAA GCCTAAATCA TGCATCATTT CTTCCCAAGT 1500 

AATGGGTTCA ACTCCTTTTT CCAAGTCTTC TAAATACTCT TGATAGGCTA AATCTGCCAC 1560 
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ACGAGCATCG TATTCATCTT CTAGGGCTTC AAGAGTTTTG GTGCGAATAA GTTCCGAAAG 1620 

GGAAACTCCT TCAAACTTAG CCATTGCTTT CATAAATGTT TTATCAGCTT CAGAAACTTT 1680 

TAATGTAATA GTAGTCATCT TTTGTGCTCC CTTTTTTAAT GGTAACACCA TTGTATTACT 174 0 

TTTTAGGTGT TCAGTCAATA TAAAAAGAAC ACCTTCTCAG CGTTCTTTCT ATATCTCTGT 1800 

CAATGGTGTT GCGGTATCTG GTGAGGTATC ATAAACCTTA AAGTCTACTC CGACTCCCAG 1860 

ATCAGCTTGA GCCAGCTGAT TGACCATGGT CATATGAGCC AGTTCCTTGA TATTGTTTTC 192 0 

CTTAGATAAA TGCCCAAGGT AAATCTTCTT AGTACGATTT CCTAGCGTCC G AAT CAT AG C 1980 

TTCAGCACCG TCCTCGTTAG AAAGGTG AC C AAGGTCAGAT AGGATTCGTT GTTTGAGTCG 2 040 

CCAAGCGTAA GAACCTGATC GCAAAATCTC TACATCATGG TTGGCCTCGA TAAGATAACC 2100 

ATCCGCATTT TCGACAATGC CCGCCATACG GTCACTGACA TAACCTGTAT CTGTCAAGAG 2160 

GACAAAACTC TT AT CAT C CT TCATAAAGCG AT AG AAC T GC GGTGCGACTG CATCATGGCT 2220 

TACACCAAAA CTCTCGATGT CGATATCTCC AAAGGTTTTG GTTTTACCCA TTTCAAAAAT 2 2 80 

ATGCTTTTGC GAAGAATCCA CCTTGCCAAG ATATTTACTA TTTTCCATAG CTTGCCAGGT 2340 

CTTTTCATTG G C AT AAAG AT CCATACCATA CTTGCGAGCC AAAACGCCTA CTCCATGGAT 24 00 

ATGATCTGAA TGCTCATGGG TAATCAAGAT GGCATCCAGG TCTTCTGGCT TACGGTTAAT 2 4 60 

TTCAGCTAGC AGACTGGTAA TTTTCTTGCC AGACAAGCCT GCATCTACTA AAAGCTTCTT 252 0 

TTTTGAGGTT T C C AG AT AAA AAGAATTTCC ACTGGAACCC GACGCTAAAA TACTGTATTT 2 580 

AAAGCCTATT TCACTCATTC TAGTCTTCTA CTTCATCCTC CCATACTTCT TCTTTCACTG 2 640 

CATCCTTATC ATAAGGGAGT ACAATGGTAA AGGTTGAACC CTTGCCGTAT TCACTCTTGG 27 00 

CCCAAATAAA GCCCTTATGT TGTTTGATAA TTTCTTTAGC GATAGACAGT CCTAGACCTG 27 6 0 

TACCACCTTG TGCACGACTT CTAGCACGAT CC AC AC GAT A GAAACGGTCA AAGATACGTG 2 82 0 

GTAAATCCTG CTTAGGAATC CCCAAACCGT GGTCAGAAAT GGATAAAATC ATCTGGTCTT 2 880 

CAGTTGTCTT CATTCTGACA GTGATTTTAC CCCCATCTGG CGAATACTTA ATAGCATTAT 2 94 0 

TTAAAATATT GTCGACAACC TGCGTCATCT TATCTGTATC AATTTCCATC CAGATAGAAT 3 000 

TGATGGGATA ATCTCTCACC AACTCATATT TTTTCTCCTT TTCCTGTCCT TTCATCTTGT 3 060 

CAAAACGATT GAGGATAAAG GTAATAAAAG CAGTGAAGTT AATCAGTTCC ACATCTAGGT 312 0 

GACTGGTAGC ATTATCAATA CGTGAAAGAT GGAGGAGATC CGTCACCATG C GC AT CAT AC 3180 

GGTTGGTCTC ATCAAGAGAA AC CTTG AT AA AGTCTGGTGC TACAGTTTCA CACAAAGCCC 324 0 

CCTCATCCAA GGCTTCAAGA TAGGATTTTA CGCTAGTCAG AGGAGTCCGT AACTCATGGC 3 3 00 

TAACATTGGA AACAAAGAGT CTTCGTTCGC GTTCTTCCTT CTCCTGCTCC GTCGTATCAT 3 3 60 
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GCAAAACAGC 


CACCAAACCT 


GAAATAAAGC 


CAGACTCTCG 


ACGTATCAAG 


GCAAAGCGAA 


3420 


CTCGAAGGTT 


CAAATATTCG 


CCATTGATAT 


CTTGGGAATC 


TAGCAACAAT 


TCTGGACTTT 


3480 


GGGTAATCAA 


ATCACGCAAT 


TCATAGTTTT 


CTTCTATCTT 


GAGCAATTCC 


AAAATGCTTC 


3540 


TATTCAGAAC 


ATCTTCCTTA 


ACCAACCCCA 


GTTGCTTCTT 


GGCTGTATCG 


TTAATCATGA 


3600 


TAATCTGACC 


CCGACGGTTA 


GTCGCAAGAA 


CCCCATCTGT 


CATATAAAAC 


AGAATACTAT 


3660 


TTAGCCTCTT 


ACTCTCTTGT 


TCTAGATTTT 


CCTGAGTGAG 


ACGAATAACC 


TCCGACAAGT 


3720 


CATTCAAATT 


ATTGGTAATA 


TTGGTGATTT 


CAGACCCACC 


TTGCATATCA 


AGAACCTTGG 


3780 


AATAATCTCC 


TGCAATCAAA 


TCTTTAACCT 


TTTGATTGAC 


TTGCTTCAAC 


TGAATATTAT 


3840 


CACGTCTATT 


TTCCAGTAAT 


AAGAGGGTCA 


CAACAAGGAT 


GAAACCTAAC 


AAAATCAGGA 


3900 


TAAAGATAAA 


ATCTCTGGTA 


AAAATGGTTT 


GTTTCAGTAA 


ATCAAGCATT 


ATTTCTCATG 


3960 


TAATACCCTA 


CACCACGGCG 


CGTCAAGATA 


TACTCTGGTC 


GGCTGGGCGT 


ATCTTCAATC 


4020 


TTCTCACGCA 


GACGTCGTAC 


AGTCACATCA 


ACTGTACGGA 


CATC AC C AAA 


ATAGTCATAA 


4080 


CCCCAGACAG 


TCTCAAGCAA 


GTGTTCGCGC 


GTGATGACTT 


GACCTGTATG 


CGATGCTAAA 


4140 


TGATACAAAA 


GCTCAAATTC 


ACGATGGGTT 


AAGTCTAGTT 


CTT C GC CAT A 


TTTTTTAGCC 


4200 


ACGTAGGCGT 


CTGGAACAAT 


TTCTAAATCC 


CCAATTTGGA 


TAGGTTGAGG 


TTTACTATCT 


4260 


GCTTCCTGAC 


CATCTACTGG 


CATAGGTTGA 


GAACGACGCA 


GAAGAGCTTT 


AACACGCGCC 


4320 


TGCAACTCAC 


GATTGGAGAA 


GGGTTTTGTT 


ACATAGTCAT 


CTGCCCCAAG 


TTCCAAACCG 


4380 


ATAACCTTAT 


CAAATTCACT 


ATCTTTGGCT 


GAAAGCATAA 


GAATGGGCAC 


ACTGCTTGTC 


4440 


TTACGAATGG 


TCTTAGCAAC 


TTCTAAACCA 


TCAATTTCTG 


GAAGCATCAA 


AT C C AG AAT A 


4500 


ATAATATCTG 


GTTGCTCTGC 


TTCAAATTGC 


TCTAGCGCTT 


CACGACCATT 


AAAAGC AG TT 


4560 


ACAACTTCGT 


AACCTTCCTT 


GGTCATATTA 


AACTTGATAA 


TATCCGAGAT 


TGGTTTCTCA 


4620 


TCATCTACAA 


TTAGTATTTT 


TTTCATATGT 


TCACCTTTTT 


CTCTACTATT 


ATACCAAAAA 


4680 


AATAGTCAGA 


AGACACAATA 


GCTAGTCTTG 


GCTACTGTCT 


AAGTTGGCTT 


GTGCATAAAC 


4740 


CTGCCAGATT 


TTTTGTTGGG 


GTTTGGCAAG 


TGGGTAATTC 


TTGAATTCTT 


CTGGTGAAAG 


4800 


CCAGCGAACT 


TCCCTATCTG 


AAAAATCATG 


GAAGTCACTC 


ACCTGACCTG 


C T AC AAT C TG 


4860 


TACATGCCAT 


TTTCGATGAC 


TAAAAACATG 


CTGG AC TGT A 


TCAAAACAAA 


CATCAAGCCA 


4920 


ATCAACATCT 


AGGTCATAGT 


CCTGCTGGAA 


ACTCTCTTCT 


GGACTGGGAC 


CAAAGTTCAC 


4980 


ACTTTCTTCC 


GCAACCTGAT 


GAAAGAGGTC 


AAACTGCTCT 


TCTTGCGAAA 


AGTTATCAAC 


5040 


TTCTATAAAG 


GGGAAATGCC 


AAAAACCTGC 


CAAGAGCTTT 


TCGCTTTCAT 


TTTTTTCAAG 


5100 
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TAAAAATTGT 


CCTTGAGAAT 


TTTTCACAAC 


TAAGGCTTTA 


AGATAAATAG 


GAACCGGCTT 


5160 


TTTCTTAGGA 


GATTTAATTG 


GATAACGGTC 


CATGGTTCCA 


TTCTGATATG 


CCGCACTAAA 


5220 


GTCCTTGACT 


GGGCTTTCTT 


CAGGTCTGGG 


ATTTACAGGA 


G ACT C AAT AT 


C AG AC C C T AA 


5280 


GTCCATCAAG 


GCTTGATTAA 


AATCACCCGG 


ACG AT CCGG A 


TTAATCAAGA 


TCTCCATCAT 


5340 


TGCCTGAAAA 


ATTTTTCGAT 


TACTTGGAAT 


CCCAATATCG 


TGGTTGACTT 


CAAACAGACG 


5400 


CGCCAAGACC 


CGCATGACAT 


T AC CATC T AC 


AGCTGGCTCA 


GGCAAGTTAA 


AAG C AAT ACT 


5460 


GGAAATGGCT 


CCTGCTGTGT 


AAGGTCCAAT 


CCCTTTCAAG 


CTGGAAATTC 


CTTCATAGGT 


5520 


ATTTGGAAAT 


TGGCCACCAA 


AGTCAGTCAT 


AATCTGCTGG 


GCTGCAGCCT 


GCATATTGCG 


5580 


AACTCGAGAA 


TAATAGCCCA 


AGCCCTCCCA 


AGCTTTCAGT 


AAACTCTCCT 


CAGGCGCAGT 


5640 


TGCCAGACTT 


TCGACAGTTG 


GAAACCAGTC 


CAAAAATCTT 


TCGTAGTAAG 


GGATAACTGT 


5700 


ATCCACCCTG 


GTCTGCTGAA 


GCATGATTTC 


AG AT AC CC AG 


ATGTGATAAG 


GATTTTTACT 


5760 


TCTCCTCCAA 


GGCAAATCTC 


TTTTGTTTTC 


ATCATACCAA 


GCGAGAAGTT 


TCTCACGGAA 


5820 


AGAAATGACT 


TTCTCCTCCG 


GCCACATGAC 


GATACCGTAT 


TCTTTCAAAT 


C T AAC AT AT C 


5880 


TCTAGTATAA 


CACAGAAGGT 


TTCACCTGTC 


TTTGTATCTG 


ATTTATAATA 


TTTTCAATAG 


5940 


ATAGTATATA 


ACTTTTCTAT 


CTACTTATAC 


TCAATGAAAA 


TCAAAGAGCA 


AACTAGGAAG 


6000 


CTAGCCGCAG 


GTTGCTCAAA 


ACACTGTTTT 


GAGGTTGTGG 


ATAGAACTGA 


CAGAGTCAGT 


6060 


ATC AT AT Ac T 


ACGGCAAGGT 


GAAGCTGACG 


TAGTTTGAAG 


AGATTTTCGA 


AG AG TAT AAA 


6120 


TCTTATTGAT 


GAACTGCTTG 


CAGTCTGAGA 


AAAAATGAGC 


TTGGATATTA 


TTTCCAAACT 


6180 


CACTTAAAGT 


CAATTTCAAT 


CCACTAGAAC 


AAGCCTAGTA 


CAGTTCCATC 


GCTTTCAACA 


6240 


TCCATGTTGA 


GAGCTGCTGG 


ACGTTTTGGA 


AGACCTGGCA 


TGGTCATAAC 


ATCACCAGTT 


6300 


AAGGCAACGA 


TGAAGCCTGC 


ACCTAATTTT 


GGTACCAATT 


CACGAATGGT 


AATTTCAAAG 


6360 


TTTTCTGGTG 


CTCCAAGCGC 


ATTTGGATTG 


TCTGAGAAAC 


TGTATTGAGT 


TTT AG C C AT A 


6420 


CAGATTGGCA 


ATTTGTCCCA 


ACCGTTTTGA 


AC G AT TTG AG 


CAATTTGTGT 


TTGAGCTTTC 


6480 


TTCTCAAAGT 


TCACTTTGCT 


ACCACGATAG 


ATTTCAGTGA 


CAATTTTTTC 


AATCTTTTCT 


6540 


TGGACAGAAA 


GGTCATTATC 


ATACAAACGT 


TTATAGTTAG 


CTGGATTTTC 


AGCAATTGTC 


6600 


TTAACAACTG 


TTTCGGCAAG 


TGCTACTCCA 


CCTTCTGCTC 


CATCAGCCCA 


GACACTAGCC 


6660 


AATTCAACTG 


GTACATCGAT 


TGAGGCACAG 


AGTTCTTTTA 


AGGCTGCAAT 


TTCAGCTTCT 


6720 


GTATCAGATA 


CAAATTCGTT 


AATAGCTACA 


ACTGCTGGAA 


TACCGAACTT 


AC G G AT ATTT 


6780 


TCAACGTGGC 


GTTTCAAGTT 


AGCAAAACCT 


GCACGAACTG 


CCTCTACATT 


TTCTTCAGTC 


6840 


AGAGCGTCTT 


TAGCCACACC 


ACCATTCATC 


TTAAGGGCAC 


GAAGGGTTGC 


GACAATAACA 


6900 
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GAGATGTTGG 


CAAGTTTGGT 


GTCTTGATAT 


CAAGGAATTT 


CTCAGCACCA 


6960 




O A A A A A 


TTCAGTAACA 


GTGTAATCAG 


CCAAGTGAAG 


GGCTGTTGTC 


7020 




CAGAG I TACA 


G C CAT G AG CG 


ATATTGGCAA 


ATGGACCACC 


GTGTACAAAG 


7080 


GCAGGTGTAC 


CGTAAATTGT 


CTGAACCAAG 


TTTGGCTTAA 


TAGCATCCTT 


CAAAATCAAA 


7140 


LjCCAAGGCAC 


CCTCAACCTG 


CAAATCACCT 


ACAGAAACAG 


GCGTACGGTC 


AT AG C G AT AA 


7200 


CCAATAACGA 


T ATT CG C C AA 


ACGACGTTTC 


AAGTCCTCGA 


TGTCCGTTGC 


CAAGCAAAGA 


7260 


AT TG C C AT G A 


TTTCTGAAGC 


AACTGTAATA 


T C AAAAC CAT 


CCTCACGTGG 


AATACCGTTT 


7320 


AG AGG AC C AC 


CAAGACCAAC 


AGTCACATGG 


CGGAGCGTAC 


GGTCGTTCAA 


GTCCACAACG 


7380 


CGTTTCCAGA 


GGATACGACG 


TTGATCAATT 


CCCAGCTCAT 


TCCCTTGGTG 


CAAGTGGTTG 


7440 


TCAATCAAGG 


CAGAAAGGGC 


ATTGTTGGCA 


GTTGTAATAG 


C AT GC AT AT C 


TCCAGTAAAG 


7500 


TGGAGGTTGA 


TGTCTTCCAT 


TGGCAGAACT 


TGTGCATACC 


CACCACCAGC 


AGCACCACCC 


7560 


TTGATCCCCA 


TGACTGGACC 


AAGAGACGGT 


TCGCGGATAG 


CAATCATGGT 


TTTCTTGCCA 


7620' 


ATCTTGTTCA 


AGGCATCCGC 


AAGACCAATG 


GTAAGCGTCG 


ACTTTCCTTC 


ACCTGCAGGT 


7680 


GTTGGGTTGA 


TGGCAGTAAC 


C AAGAT CAAT 


TTACCGACTG 


GATTGCTCTC 


AACTGCACGA 


7740 


AT T T T ATC AA 


AGCTGAGTTT 


AGCCTTGTAC 


TTTCCGTACA 


ACTCCAAATC 


G T C AT AAG AA 


7800 


AT AC C AAGT T 


TCTCTACAAC 


ATCAACAATT 


GGCTTCAACT 


CAAT ACT C T G 


TGCGATTTCA 


7860 


ATATCTGTTT 


TCATTCAAAA 


TTCCTCTAAC 


CT C T TAT AT G 


ATAATTCATT 


AT AT C AC AAA 


7920 


AC AAGAT T T T 


TAACATCCTA 


AAACTCTCTA 


AACGTTCGTA 


AATATCTCTG 


TTTTTAAGAC 


7980 


TTTTAGAGTC 


CTTTCTTAAA 


TTTTATATGG 


CTTTATAGTT 


TGAAACTATA 


ATAAATCTTC 


8040 


biill I ACCA 


AAAATTTATC 


ACTTTCATTT 


TACTTACCGC 


TTATTTTTGT 


GTACAATAGT 


8100 


vjC 1 A 1 CjAAAA 


TTTTAGTTAC 


ATCGGGCGGT 


ACCAGTGAAG 


CTATCGATAG 


CGTCCGCTCT 


8160 


ATCACTAACC 


ATTCTACAGG 


TCACTTGGGG 


AAAATTATCA 


CAGAGACTTT 


GCTTTCTGCA 


8220 


GGGTATGAAG 


TTTGTTTAAT 


TACGACAAAA 


CGAGCTCTGA 


AG C C AG AG C C 


TCATCCTAAC 


8280 


CTAAGTATTC 


GAGAAATTAC 


CAATACCAAG 


GACCTTCTAA 


TAGAAATGCA 


AGAACGTGTT 


8340 


C AGG ATT AT C 


AGGTCTTGAT 


CCACTCAATG 


GCTGTTTCTG 


ACTACACTCC 


TGTTTATATG 


8400 


ACAGGGCTTG 


AGGAAGTTCA 


GGCTAGCTCC 


AATCTAAAAG 


AATTTTTAAG 


CAAGCAAAAT 


8460 


CATCAGGCCA 


AGATTTCTTC 


AACTGATGAG 


GTTCAGGTTT 


TGTTCCTTAA 


AAAGACACCC 


8520 


AAAATCATAT 


CCCTAGTCAA 


GGAATGGAAT 


CCTACTATTC 


ATCTGATTGG 


TTTCAAACTG 


8580 


CTGGTTGATG 


TTACCGAAGA 


TCATCTGGTT 


GACATTGCAC 


GAAAAAGTCT 


TATCAAGAAT 


8640 
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CAAGCAGATT TAATCATCGC GAATGACCTG ACT C AAAT T T C AGC AG AT C A GCACCGAGCT 87 00 

ATATTTGTTG AGAAAAATCA GCTTCAAACA GTCCAGACTA AAGAAGAAAT TGCAGAACTC 87 60 

CTCCTTGAAA AAATTCAAGC CT AT CAT T CT TAGAAAGGAA AACTATGGCA AACATTCTCT 8820 

TGGCTGTAAC GGGTTCAATC GCCTCTTATA AGTCGGCAGA TTTAGTCAGT TCTCTAAAAA 888 0 

AACAAGGCCA TCAAGTCACT GTCTTAATGA CTCAGGCTGC TACAGAGTTT ATCCAACCTT 8940 

TGACACTACA GGT ACT CT C A CAGAATCCTG TCCACTTGGA TGTCATGAAG GAACCCTATC 9000 

CTGATCAGGT C AAT CAT AT C GAACTTGGAA AAAAAGCAGA TTTATTTATC GTGGTACCTG 9060 

CAACTGCTAA C AC TAT TGC A AAACT AG CT C ACGGATTTGC GGACAACATG GTAACCAGTA 912 0 

CAGCTCTAGC CCTACCAAGT CATATTCCCA AAC T AAT AG C TCCTGCTATG AATACAAAAA 9180 

TGTATGACCA TCCAGTAACT CAGAATAATC TGAAAACATT AG AAAC T AC G GCTATCAGCT 9240 

GATTGCTCCT AAGGAATCCC TACTAGCTTG TGGAGACCAC GGACGAGGAG CTTTAGCTGA 93 0 0 

CCTCACAATT ATT T TAG AAA GAATAAAGGA AACTATCGAT GAAAAAACGC T C T AAT AT T G 93 60 

CACCCATTGC TATCTTTTTT GCTACCATGC TCGTGATACA CTTTCTGAGC TCACTTATCT 9420 

TTAACCTTTT TCCATTTCCA ATCAAACCGA CCATTGTTCA TATTCCTGTC ATTATTGCCA 94 80 

GCATTATTTA TGGTCCACGA GTTGGGGTTA CACTTGG AT T TTTGATGGGA T T AC TT AGC T 9540 

TGACGGTTAA CACGATTACG ATTCTACCGA CAAGCTACCT CTTCTCTCCC TTCGTACCAA 9 600 

ACGGAAACAT CTACTCAGCT ATCATTGCCA TCGTCCCACG TATTTTGATT GGTTTAACTC 9660 

CTTACTTAGT CT AT AAAC TG AT GAAAAAC A AGACTGGTCT GATTTTAGCT GGAGCCCTTG 972 0 

GTTCcTTGAC AAAT AC TAT C TTTGTCCTTG GAGGAATCTT CTTCCTATTT GGAAATGTTT 9 7 80 

ATAATGGAAA TATCCAACTT CTTCTGGCAA CCGTTATCTC AAC AAAT T C A AT TG C T G AAT 9840 

TGGTCATTTC TG C AAT T C T A ACCCTAGCCA TTGTTCCACG ACTACAAACC TTGAAAAAAT 9900 

AAAAACAGG 9909 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TAATTTTCAT ATAATAGTAA AAT AG AAT GT GTGATTCAAT AATCACCTCA AATAGAAAGG 60 

AAATTCTATG TCAAATCTAT CTGTTAATGC AATTCGTTTT CTAGGTATTG ACGCCATTAA 12 0 
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TAAAGCCAAC TCAGGTCATC CAGGTGTGGT TATGGGAGCG GCTCCGATGG CTTACAGCCT 180 

CTTTACAAAA CAACTTCATA TCAATCCAGC TCAACCAAAC TGGATTAACC GCGACCGCTT 240 

TATTCTTTCA GCAGGTCATG GTTCAATGCT CCTTTATGCT CTTCTTCACC TTTCTGGTTT 300 

TGAAGATGTC AGCATGGATG AGATTAAGAG TTTCCGTCAA TGGGGTTCAA AAACACCAGG 3 60 

TCACCCAGAA TTTGGTCATA CGGCAGGGAT TGATGCTACG ACAGGTCCTC TAGGGCAAGG 42 0 

GATTTCAACT GCTACTGGTT TTGCCCAAGC AGAACGTTTC TTGGCAGCCA AATATAACCG 480 

TGAAGGTTAC AATATCTTTG AC C AC TAT AC TTACGTTATC TGTGGAGACG GAGACTTGAT 540 

GGAAGGTGTC TCAAGCGAGG CAGCTTCATA CGCAGGCTTG CAAAAACTTG ATAAGTTGGT 6 00 

TGTTCTTTAT GATTCAAATG ATATCAACTT GGATGGTGAG ACAAAGGATT CCTTTACAGA 660 

AAGTGTTCGT G AC CGT T AC A ATGCCTACGG TTGGCATACT GCCTTGGTTG AAAATGGAAC 72 0 

AGACTTGGAA GCCATCCATG CTGCTATCGA AACAGCAAAA GCTTCAGGCA AGCCATCTTT 7 80 

GATTGAAGTG AAGACGGTTA TTGGATACGG TTCTCCAAAC AAACAAGGAA CTAATGCTGT 84 0 

ACACGGCGCC CCTCTTGGAG CAGATGAAAC TGCATCAACT CGTCAAGCCC TCGGTTGGGA 9 00 

CTACGAACCA TTTGAAATTC CAGAACAAGT ATATGCTGAT TTCAAAGAAC ATGTTGCAGA 960 

CCGTGGCGCA TCAGCTTATC AAGCTTGGAC TAAATTAGTT GCAGATTATA AAGAAGCTCA 102 0 

TCCAGAACTG GCTGCAGAAG TAGAAGCCAT CATCGACGGA CGTGATCCAG TCGAAGTGAC 1080 

TCCAGCAGAC TTCCCAGCTT TAGAAAATGG TTTTtCTCAA GCAACT 112 6 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 252 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

CCGGCAACAA AAAAGAAAAA ATCAACAGTT AAAAAAAATC TAGTCATCGT GGAGTCGCCT 60 

GCTAAGCCAA GACGATTGAA AAAT AT CT AG GCAGAAACTA CAAGGTTTTA GCCAGTGTCG 12 0 

GGCATATCCG TGATTTGAAG AAATCCAGTA TGTCCGTCGA TATTGAAAAT AATTATGAAC 180 

CGCAATATAT TAATATCCGA GGAAAAGGCC CTCTTATCAA TGACTTGAAA AAAGAAGCTA 24 0 

AAAAAGCTAA TAAAGTTTTT CTCGCGAGTG ACCCGGACCG TGAAGGAGAA GCGATTTCTT 3 00 

GGCATTTGGC CCATATTCTC AACTTGGATG AAAATGATGC CAACCGTGTG GTCTTCAATG 3 60 
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AAATCACCAA GGATGCAGTC AAAAATGCTT TTAAAGAACC TCGTAAGATC GATATGGACT 42 0 

TGGTCGATGC CCAACAAGCT CGTCGGATCT TGGATCGCTT GGTAGGGTAT TCGATTTCGC 480 
CTATTTTGTG GAAGAAGGTC AAGAAGGGCT TGTCAGCAGG TCGCGTTCAG TCCATTGCCC 54 0 

TTAAACTCAT CATTGACCGT GAAAATGAAA TCAATGCCTT CCAGCCAGAA GAATACTGGA 6 00 

CAGTTGATGC TGTCTTTAAA AAGGGAACCA AACAATTTCA TGCTTCCTTC TATGGAGTAG 6 60 

ATGGTAAAAA GATGAAACTG ACCAGCAATA ACGAAGTCAA GGAAGTCTTG TCTCGTCTGA 72 0 

CGAGTAAAGA CTTTTCAGTA GATCAGGTGG ATAAGAAAGA GCGCAAGCGC AATGCTCCTT 7 80 

T AC C C TAT AC CACTTCATCT ATGCAGATGG ATGCTGCCAA TAAAATCAAT TTCCGTACTC 840 

GAAAAACCAT GATGGTTGCC CAACAGCTCT ATGAAGGAAT TAATATCGGT TCTGGTGTTC 9 00 

AAGGTTTGAT T AC C TAT AT G CGTACCGATT CGACTCGTAT CAGTCCTGTA GCGCAAAATG 9 60 

AGGCGGCAAG CTTCATTACG GATCGTTTTG GTAGCAAGTA TTCTAAGCAC GGTAGCAAGG 102 0 

TCAAAAACGC ATCAGGTGCT CAGGATGCCC ATGAGGCTAT TCGTCCGTCA AGTGTCTTTA 1080 

ATACACCAGA AAGCATCGCT AAGTATCTGG ACAAGGATCA GCTTAAGCTA TATACCCTTA 114 0 

TCTGGAATCG TTTTGTGGCT AG C C AG ATG A CAGCGGCCGT TTTTGATACC ATGGCTGTTA 12 00 

AATTGTCTCA AAAAGGGGTT CAATTTGCTG CCAATGGTAG TCAGGTTAAG TTTGATGGTT 12 60 

AT C TTG C CAT TTATAATGAT TCTGACAAGA ATAAGATGTT ACCGGACATG GTTGTTGGAG 13 2 0 

ATGTGGTCAA ACAGGTCAAT AGCAAACCAG AGCAACATTT CACCCAACCG CCTGCCCGTT 13 80 

ATTCTGAAGC AACACTGATT AAAACCTTAG AGGAAAATGG GGTTGGACGT CCATCAACCT 1440 

ACGCGCCAAC CATTGAAACC ATTCAGAAAC GTTATTATGT TCGCCTGGCA GCCAAACGTT 1500 

TTGAACCGAC AGAGTTGGGA GAAATTGTCA ATAAGCTCAT CGTTGAATAT TTCCCAGATA 15 60 

TCGTAAACGT GACCTTCACA GCTGAAATGG AAGGTAAACT GGATGATGTC GAAGTTGGAA 162 0 

AAGAGCAGTG GCGACGGGTC ATTGATGCCT TTTACAAACC ATTCTCTAAA GAAGTTGCCA 168 0 

AGGCTGAAGA AGAAATGGAA AAAAT C C AG A TTAAGGATGA ACCAGCTGGA TTTGACTGTG 17 40 

AAGTGTGTGG CAGTCCAATG GTCATTAAAC TTGGTCGTTT TGGTAAATTC TACGCTTGTA 1800 

GCAATTTCCC AGATTGCCGT CATACCCAAG CAATCGTGAA AG AG AT TGGT GTTGAGTGTC 1860 

CAAGCTGTCA TCAGGGACAA ATTATTGAGC GAAAAACCAA GCGTAATCGC CTATTCTATG 1920 

GTTGCAATCG CTATCCAGAA TGTGAATTTA CCTCTTGGGA CAAGCCTGTT GGTCGTGACT 19 80 

GTCCAAAATG TGGCAACTTC CTCATGGAGA AAAAAGTCCG TGGTGGTGGC AAGCAGGTTG 2 040 

TTTGTAGCAA AGGCGACTAC GAGGAAGAAA AGATGGCTCT TTGTCAACTG TAGTGGGTTG 2100 

AAGTCAGCTA AGCTCGAGAA AGGACAAATT TTGTCCTTTC TTTTTTGATA TTCAGAGCGA 2160 



WO 98/18931 



PCT/US97/19588 



233 

TAAAAATCCG TTTTTTGAAG TTTTCAAAGT TCCGAAAACC AAAGGCATTG CGCTTGATAA 222 0 

GTTTGATGAG ATTATTGGTC GCTTCCAATT TGGCGT T AG A ATAGTGTAGT TGAAGGGCGT 2 2 80 

TGACGATTTT CTCTTTGTCC TTTAGAAAGG TTTTAAAGAC AGTCTGAAAA AGAGGATGAA 2340 

CCTG C T T TAG ATTGTCCTCA ATGAGTCCGA AAAATTTCTC CGGTTCCTTA TTCTGAAAGT 2 4 00 

GAAACAGCAA GAGTTGATAG AGCTGATAGT GATGTTTCAA GTCTTGTGAA TAGCTCAAAA 24 6 0 

GCTTGTTTAA AATCTCTTTA TTGGTTAAAT GCATACGAAA AGTAGGGCGA TAAAAATGTT 2 52 0 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TTTTCTCGAT AATAACTTCC ACCTTATTAT TTGGGATACC CTCCTCTTCT TCACCACCAC 60 

GTTCATAGTA GTCATCGCGA TAGAGAAAAG CT AC GAT AT C AGCGTCCTGC TCAATAGACC 12 0 

C AG ATT C AC G AATATCAGAC AAGACCGGTC TCTTGTCCTG ACGTTGTTCT ACACCACGAG 180 

AAAGCTGACT CAGAGCGATT ACTGGAACCT TCAATTCCTT GGCTAGTATT TTCAACTGAC 240 

GAGAAATTTC AGAAACTTCT TGTTGACGAT TTTCTCGACC AGTTCCCGTG AT AAGT TGC A 300 

AATAGTCTAT CAAAATCAAA CCAAGATTTC CAGTTTCTTG AGCCAATTTA CGAGAACGAG 3 60 

AACGAATCTC TGTAATCCGA ATACCTGGCG TATCATCGAT AT AG AT AC TG GCGTTAGcTA 42 0 

G ATT AC CCTG AGCAATAGTA TATTTTTGCC ACTCCTCATC TGTCAATTGC CCTGTACGGA 4 80 

TAGAATGTGA CTCCACTAAG CCTTCTGCAG CT AAC AT AC G ATCTACCAAG CTTTCCGCAC 54 0 

CCATTTCGAG TGAAAAAATA GCAACCGTTT TGTCCAACTT AGTCCCAATG TTCTGAGCG A 6 00 

TATTCAAGGC AAATGCTGTC TTACCAACTG CTGGACGAGC TGCTAAGATA ATCAACTCCT 660 

CCTCATGAAG TCCTGTTGTC ATATGATCCA AATCACGATA ACCTGTCGCA ATACCTGTAA 72 0 

TATCGGTCGT TTGTTGCGAG CGAGCTTCCA G ATT T C C AAA GTTGAGATTC AAC AC AT C TC 78 0 

GAATGTTCTT AAACCCGCTT CGATTTGCAT T T T C ACT G AC ATCAATCAAC CCTTTTTCTG 84 0 

CCTGAGCAAT AATTTCATCA GCTGGTTGTG ACGCTTCGTA AGCTTGGTTG ACAGACTCTG 900 

TCAACTTGGC AATTAAACGA CGTAGCATTG CTTTTTCTGC AACAATCTTA GCATAATACT 9 60 

CCGCATTAGC AGAAGTTGGC ACAGAATTAA CAATCTCAAC CAAGTAAGAC AAGCCACCAA 102 0 
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TATTCTGTAA 


ATCACCTTGA 


TTATCAAGGA 


TAGTACGAAC 


CGTTGTTGCA 


TCTATGGCAT 


1080 


CACCACGATC 


GGATAAATCG 


ACCATGGCTT 


GGAAAATCAA 


ACGATGGGCA 


TACTTAAAAA 


1140 


AGTCCCGAGA 


CTCAATGTAT 


TCTCGCACAA 


AAACAAGTTT 


ACTCTCATCA 


ATAAAGATAG 


1200 


CCCCTAAAAC 


GGATTGCTCA 


GCTAAGATAT 


CTTGAGGTTG 


TACTCGTAAC 


TCTTCTACTT 


1260 


CTGCCATCAG 


ACTTCCCTTC 


CTTTTACAAT 


CTTGTCAAGA 


AGGTGTAAAC 


TTATCCTTCT 


1320 


TTCACACGAA 


GATTGATTAC 


ACTTGTGATA 


T CTTG AT AG A 


TTTTCACTGG 


CACATCAATC 


1380 


AAACCAACCG 


CTCGAATCGG 


AGCTTGTACT 


TGAATATGAC 


GTTTATCAAT 


CTTAATT C C A 


1440 


AATTGCTTTT 


GCAATTCTTC 


TGCAATCTTC 


TTATTGGTAA 


TAGAACCAAA 


GGTACGACCA 


1500 


TCTGGACCAA 


CTTTTTCAAC 


AAATTCTACA 


ACAGTTTCTT 


CTGCTTCAAG 


TTGTGCTTTA 


1560 


ATTGCTTTTC 


CTTCTGCAAT 


CATCTCAGCG 


TGAGCTTTTT 


CTTCCGATTT 


TTGTTTACCA 


1620 


CGAAGTTCAC 


CTACAGCTTG 


AGCAGTCGCT 


TCTTTGGCTA 


GATTCTTTTT 


GATAAGAAAG 


1680 


TTTTGCGCAT 


ACCCTGTTGG 


TACTTCCTTA 


ATTTCGCCTT 


TTTTACCTTT 


TCCTTTAACA 


1740 


TCTGCTAAAA 


AG ATT AC T T T 


CATTCTTCTT 


TCTCCTTTTC 


CTTCATTTCA 


TTTAATACAA 


1800 


TTTCTGTCAG 


TTTTTCACCT 


GCTTCTGACA 


AGGTTACATC 


TTTAATTTGA 


GCTGCTGCCA 


1860 


AATTAAAGTG 


GCCTCCACCG 


CCTAACTCTT 


CCATAATCCG 


TTGT AC AT T C 


AGTTTACTAC 


1920 


GACTTCGAGC 


TGAGATAGAG 


ATAAATCCTT 


GTGTATTCTT 


CGCAAGAACA 


AAACTCGCTT 


1980 


CAATACCTGA 


CATGGCTAAC 


ATGGCATCTG 


CTGCCTTACT 


AATAACAACT 


GTATCATAGC 


2040 


ATTTCATGTC 


CTTAGCCTCT 


GCTATTAGTA 


CATCTGAACC 


TAATTTACGC 


CCCTGTAAAA 


2100 


TAAGTTCATT 


GACCTCACGA 


TATTCTTCAA 


AATCTGTCGC 


AG CG ATT T C C 


TGGATAGCAA 


2160 


TACTATCACT 


TCCGCGCGTT 


C TG AG AT AG C 


TAG C AAC ATC 


AAATGTCCGA 


CTAGTTACTC 


2220 


GCGAGGTGAA 


ATTTTTAGTA 


TCCAACATCA 


TACCAGCCAT 


C AAG AC AC T T 


GCTTGCATAC 


2280 


GACTCAAACG 


ATTTTTCTTA 


GAATTCTGGA 


ACTGAATCAA 


TTCCGTTACC 


AACTCACTGG 


2340 


CACTACTTGC 


ACCACTTTCG 


ATATAAGTAA 


TAACCGCATT 


ATCTGGAAAA 


TCCTGATCCC 


2400 


TTCTATGGTG 


GTCAATAACA 


ATGGTTTGGG 


TAAATAAATC 


AT AAAATT CT 


TTTGATAATG 


2460 


TTAAGGCTGT 


CTTTGAATGG 


TCTACAAGAA 


TCAACAAAGA 


ACGATTGGTC 


ACCATCCCCA 


2520 


TTGCATCCTT 


AACAGACAAC 


AACTTCGTAA 


CTCCTTCTTT 


TTCTATGAAT 


GAAACAGCTC 


2580 


GTTCAATATC 


TGGAGACATT 


TGTTCTTCAT 


CATAAAGAGC 


AT AGCT AT T T 


TCAATCACAT 


2640 


TGCTGGCGAA 


CAACTGCATA 


CCT AC AG C AG 


AGCCCAAAGC 


ATCCATGTCT 


AAATTTTTGT 


2700 


GACCGACTAC 


AAAAACCTGA 


TCTACACTCC 


GAATCTTATC 


TGAAATAGCT 


G T C AT CAT AG 


2760 


CGCGCGTACG 


AGTCCGTGTA 


CGCTTGATTG 


AAG C AG C AG A 


CCCACCACCA 


AAATAAACTG 


2820 
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GATTTTTCGT TTCGTCGTTT TCCTTAACAA CCACCTGGTC GCCACCACGT ACTTCAGCCA 2 880 

AGTTCAAATT GAGCAAAGCA ACTTTCCCTA TCTCATCATG ATTTCCATCG C C AT AAG AAA 2940 

AT C C C AT AC T TAAGGTCAAG GGCAACTGTC TCTGTTTCGA CTCTTCTCTG AAAGCATCAA 3 000 

TAACAGAAAA TTTATCATTC ATCAAGCCCT CAAGCACCGT GTAGTCAGTA AATAGATAAA 3 06 0 

ATCGATCCAT ACTTACCCGA CGAGAAAACA TCATGTGTTT TTCTGAAAAC TCTGATATAA 3120 

AATTAGCTAC AAAACTATTG ATTTGACTAA TATCTGACTC AGAAGTTTCA TCCTCCAAAT 3180 

CATCATAATT AT CC AC AG AG ACAATCCCAA TCACTGGTCT AC TTGT T AC C AATTCATCTG 32 4 0 

TTATGGCTTG TTCCCTGGAT ACATCTACAA AATACAAAAC ACCGGAAGAA GCATCCATAT 3 3 00 

G AAC AG CAT A ACGCTTCTCA CCAAGCTTGG CATAAGTAGA CGGATTTCCT ACTGAAGCCT 3 3 60 

TGATAATCGT TTGAACAGCT TCTAAATCAA AATCACCATC TTCCTTGGTC AAAATCAATT 342 0 

CAGCATAGGG ATTAAACCAC TCAACCTCTC CAGAAGATAA ATTCAATTTC ATAACACCTA 3480 

CAGGCATCTG TTCCAATAGA GCTGTCAAAC TTTCTTCCGC TTGGTGGTTT ACATACTGTA 3 54 0 

TCTGTTCTAC ATCACTCCTT GTATAATGCA CTCTCAGTTT CTTAAATAAA AAAACATAGC 3 600 

CTCCTACAAA AAGAAACAAA ATTAAAACCG T C AAC AG AT T ATTATTAACA AAAATAATGA 3 660 

AAGTGGATAA GACTCCAAAC GCAATCAATC CTACTAGAAT AGGAAAAATT GGACTTACAT 37 2 0 

AAAATTTTTT CATTCAAAAC CTCTTGGCAC CCATTATACC ATAATACCCC TCAAAAAGCG 37 80 

ACTTTTTAAA AGTGTAATCA GTAATTCTAT CAATTATAAG AAAAAGGTAG TTTACAATTC 3 840 

AGTAAACCTA CC TTT AC AC A TATTGAAATT AAGATTCTTT AACCTCTAAC AAACCAATTT 3 900 

CGCCATCCTC ACGACGATAA ATCACATTGG TTGTCTGATC TTCAACATCC ACATAGATAA 39 60 

AGAAATCATG CCCCAATAAA TC CAT TTGT A GAATTGCTTC TTCCAAATCC ATTGGTTTTA 402 0 

AATCAATTTG TTTTGAACGA ACAACTTTAG ACTGGACAAT ATTTGAATCT TCCACCAAAG 4080 

CATCTGTAAA TAATTGACCA GTTGCTACCT TATTTTTATT TTTACGCTCG ATTTTTGTTT 414 0 

TATTTTTACG AATCTGACGT TCAATTTTAT CAGTTACAAG GTCAATTGAA CCATACATAT 42 00 

CT TGAG AT AC ATCTTCTGCG CGGAGAGTAA TAGATCCAAG CGGAATCGTT ACTTCCACTT 42 60 

TAGCCGTTTT TTCACGATAA ACTTTTAAGT TAATTCGGGC ATCCAACTCT TGTTCTGGTT 4320 

GGAAGTACTT TTCGATCTTT TCGAGTTTAG AAACTACATA ATCACGAATT GCTTCTGTTA 4380 

CTTCTAGGTT TTCACCACGG ATACTATATT TAATCATATG AGTACCTTCT TTCTAAACAT 4440 

TTTTGTTTTT ATGATTTTAT TATAACGCTT TCATTCTATT TTTGCAAATT TTTTCCTCAT 4 500 

CTTACAAGGG AAAATGTTTT TACATCCTTA GCACCAGCTT CTTCCAACAG TTT CTT AAC A 4560 
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C GAT TT AT AG 


TTGCTCCTGT 


AGTATAGATA 


TCATCTATAA 


GTAGGATTTT 


TTTAGGAATA 


4620 


GTGACTCCAC 


TTTTAATAAA 


GAAAGGAAGT 


TCTGTCCCCA 


AGCGCTCTGA 


AC G ATTTT T A 


4680 


GAAGAACTGG 


CTCTCTCTTC 


TCTTTTCTCT 


AATAAATCCA 


GAT ACT C AAA 


GCCTGCTGCC 


4740 


TCTACCAAGC 


CCTCAACCTG 


ATTAAATCCT 


CTATTAGCAT 


AT C TAT C AGG 


ACTTAGGGGA 


4800 


ATTACAACAA 


ATTGATACTC 


TTTGTACTTT 


TTCAACTCCT 


CACTTAAAAA 


TGAAGCGAAA 


4860 


ACTTTTCTTA 


AC AGGAAG T C 


TCCATCAAAC 


TT AT AC C G AC 


TGAAAAAATC 


CTTCATAGCT 


4920 


TGATTGTAAG 


TAAAAATCGC 


TCTATGACTG 


ACTTCAACTC 


CCTCTTTACA 


CCAAAGTTGA 


4980 


CAATCTTGAC 


ACTTTGTTGA 


CAACTCTGTT 


TTCATACAAT 


TTGGACAGTT 


CTCTTCCCCA 


5040 


ATTCTTTCAA 


AAGTAGAATC 


ACAGTCTGAA 


CAAAGACAAG 


AG T CATC ATT 


CCTCAGAAGT 


5100 


AAGAG AC T AC 


TAAAAGTTAA 


AACAGTCTTC 


ATAGTCTGCC 


CACATAACAA 


GCACTTCATA 


5160 


GACCAGCCTC 


CTTATTCATC 


ATCTGAATTT 


CCTTAATCGC 


CTTCTTGATT 


GAAGCATTTA 


5220 


ACCCATCATG 


GAAGAAAAGC 


AAATCTCCTG 


TCGGTCTATC 


CATGCTTCGT 


CCAACTCGTC 


5280 


CACCAATCTG 


AATCAAACTA 


GACTTGGTAA 


ACAAACGATG 


ATTGGCCTCT 


ACTACGAAAA 


5340 


CATCCACACA 


AGGGAAGGTA 


ACTCCGCGCT 


CCAAGATTGT 


CGTACTGATA 


AGTATTGTCA 


5400 


GTTCTCCATC 


TCGAAAAGCT 


TGTACTTGCT 


CTAATCGATC 


CTCTGTTACA 


GAAGATACAA 


5460 


AGCCAATTTT 


CTCATTTGGA 


AATTGCTCCT 


GTAAGATTTC 


TGCTAACTGC 


TCCCCTTTCT 


5520 


TAATTTCTGA 


AG C AAAAATG 


AGTAACGGAT 


AAGCTGTCTT 


TCTCTGCTTC 


TCAATATAGG 


5580 


ACTTTAACTT 


TGGTGACAAA 


CGATTCTTGT 


CTAAGTAGCG 


ATTAAAATCC 


GATAACCAAA 


5640 


TTGGTTTTGG 


AATAATCAAC 


GG ATT TC C AT 


GAAACCGTCT 


CGGTAAATTC 


AGTCTTTTTA 


5700 


GTTCTCCTAA 


ACGGACCTTT 


TTATCTAACT 


CATTGGTCGA 


AGTCGCTGTT 


AAAAAGATTC 


5760 


TCAATCCATT 


CTCCTTTACA 


CTATTCTTGA 


CAGCGTGGTA 


AAGCATGGGA 


TTATCAACAT 


5820 


AAGGAAAAGC 


ATCTACTTCA 


TCCACTATCA 


GCAAATCAAA 


AGCTTGATAA 


AACTTCAATA 


5880 


ACTGATGGGT 


TGTTGCAACA 


ACTAGTGGTG 


TTCGAAAATA 


AGGTTCCGAT 


TCTCCATGTA 


5940 


GCAAAGCTAT 


CCCGCAAGAA 


AAATCCTGTT 


GCAGGCGCTT 


GTACAGCTCC 


AAACAAACAT 


6000 


CTATGCGAGG 


ACT AG CC AAA 


CACACTGCAC 


CACCCGCATT 


GATCACTTTA 


GCCACTACTT 


6060 


GATAAATCAT 


TTCTGTCTTT 


CCAGCTCCTG 


TTACCGCATG 


AACTAAGGTT 


GGCTTTTGCT 


6120 


TGTCT AC T AC 


TTGAAGCAAT 


CCCTCTGACA 


CCTTCTCTTG 


AAAAGGAGTT 


AATTGGCCGC 


6180 


GCCATTTGAG 


AACATCTTGC 


TTTGGAAAAT 


CCTCCTGCGG 


AAAATAGTAT 


AAAGT TTG AT 


6240 


CACTTCTGAC 


TCGCTTCATC 


AGCAAGCACT 


CTCGACAATA 


GTAAGCACCG 


ATGGGCAAAT 


6300 


ACCATTCTTC 


TAGAATAGTA 


CTATTACAGC 


GTTGACAGAA 


AAGTTTCCCC 


TTCTCCTTTC 


6360 
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TCATTGCTGG AAGTTTCTCC GCCAACTGAC GTTCTTCTTC TGTTAATTCA TTCTCAGTAA 642 0 

AT AAACG AC C GAGATAATCT AAATTT AC T T TCATACTTCT TTATTCGTAA AAACTAGCAC 6480 

TTTAGATGAT TTTTTAGTAC AATTAAATCA TGGAATTTAG G AC AAT T AAA GAGGACGGTC 6540 

AAGTCCAAGA AG AAAT C AAA AAATCTCGCT TTATCTGCCA TGCCAAGCGT GTTTATAGCG 6600 

AAGAAGAGGC TCGTGACTTC ATTACTGCCA TCAAAAAAGA ACACTACAAA GC G AC AC AT A 6660 

ACTGCTCTGC C T T C ATT AT T GGAGAACGTA GTGAAATTAA ACGTACAAGT GATGATGGTG 672 0 

AGCCTAGTGG TACTGCTGGT GTTCCCATGC TTGGGGTACT AGAAAATCAC AATCTCACCA 67 80 

ATGTCTGTGT GGTCGTGACA CGCTACTTTG GTGGTATTAA ACTAGGCGCT GGAGGACTAA 6 84 0 

TTCGTGCTTA CGCCGGCAGT GTCGCCTTAG CTGTCAAAGA AATTGGTATT AT T G AAAT AA 6900 

AAGAACAGGC TGGCATTGCT ATTCAAATGT CTTATGCTCA GTACCAAGAG TACAGTAACT 6960 

TCCTTAAAGA ACATGGTCTC ATGGAGCTGG ATACAAACTT T AC AG AT C AA GTCGATACGA 7 02 0 

TGATTTATGT TGATAAAGAA GAAAAAGAAA CTATTAAAGC TGCACTTGTG GAGTTTTTTA 7080 

AT GG AAAAGT CACTTTAACT GACCAAGGTT TACGAGAGGT TGAAGTTCCT GTAAACTTAG 7140 

TGTAAACAAT GAATAATACA GCGTTTCGTT G AC AT T C T C A CAACTACTTT AG CG AG C AAA 72 00 

ATAAAAAGAG GCGTACCAAA ATATACTAGA AAAT G AAGC A ATTCAAACGA AACCTGATAT 72 60 

CGTTTTCCTT CACACCTATT TACTAGAATT AGCTGAACGC AATCACTTGA AAATTAATGA 73 2 0 

CTTTGATCTA TGATATATAG AAATGGTATG GATAGCGTTA TACTAAAGAT ATCTTATACA 73 80 

AAGAGGTATT CATATGTCTA TTTATAACAA C ATT AC T G AA TTAATCGGTC AAAC AC C GAT 7440 

TGTTAAACTT AACAACATCG TGCCAGAAGG TGCTGCAGAC GTCTATATAA AGCTTGAAGC 7 500 

ATTTAATCCT GGTTCATCTG TAAAAGACCG TATTGCCCTT AGCATGATTG AAAAAGCTGA 7 5 60 

ACAAGATGGT ATTCTGAAAC CTGGTTCTAC TATTGTTGAA GCAACAAGTG GAAACACCGG 7 62 0 

TATTGGACTT TCATGGGTAG GTGCTGCTAA AGGGTATAAA GTCGTCATCG TTATGCCTGA 7 6 80 

AACTATGAGT GTAGAACGAC GTAAAATTAT C C AAGC T TAT GGTGCTGAAC TCGTCCTAAC 774 0 

TCCTGGTAGC GAGGGAATGA AAGGTGCTAT TGCTAAGGCT CAAGAAATCG CTGCTGAACG 7800 

TGATGGTTTC CTTCCTCTTC AATTTGACAA TCCAGCTAAT CCAGAAGTAC ACGAAAGAAC 7 860 

AACAGGAGCT GAGATACTAG CTGCTTTCGG TAAAGATGGA TTAGATGCCT TTGTTGCTGG 792 0 

AGTAGGTACT GGTGGAACGA TTTCTGGTGT TTCTCATGCA CTCAAATCAG AAAATTCTAA 7980 

CATTCAAGTT TTTGCAGTAG AAGCAGATGA ATCTGCTATT CTATCTGGTG AAAAACCTGG 8040 

TCCTCACAAA AT T C AAGGT A TCTCAGCTGG ATTTATTCCT GAT AC AC T T G ATACTAAAGC 8100 
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CTATGATGGT ATCGTTCGTG TAACATCAGA TGACGCTCTT GCACTCGGAC GTGAAATTGG 816 0 

TGGAAAAGAA GGCTTCCTTG TAGGGATTTC CTCAGCTGCA GCTATCTACG GAGCCATCGA 822 0 

GGTTGCCAAA AAATTAGGTA CAGGTAAAAA AGTCCTTGCC CT AG C AC C AG ATAACGGTGA 8280 

ACGTTATCTC T C T AC AG C AC TTTATGAATT GTAACCGTCC AATAACGAAG TCTATTGAAA 83 4 0 

AATCTCCAGA CTAGAGAACT CACGGATAGT TCCTAATCTG GAGATTTCTT ATTTGCACTT 8400 

TTCTTGTACA ACTTTAGTCC ATGGTAAATA GGCCTCTAAA ACCTCTTTGT TTACGAGAGT 84 60 

TTCCACGTTT GGAAGACATT CTAGAAGATA GGATAGATAT TTCTCACTAT TTATAATGGA 852 0 

TTGAAATAAG ATATGAACAA AT CG ATT AG A ACATGATGGT AAAGCGTAAT CCCTTGTTTC 8580 

TCAGCTTTCC CAGACAAAAA AGTCCAATAG TAAGTCAGCT GACTATCACT CTCTAGCACC 8 640 

CTATAAGAAG TTTCATCCGC ATGAAGTAAG GGCTGAGTCA ATAGTCTCTC TCGCAAGAGG 8700 

TTATAAAGGG GCTCCAAATA GTATTGACTC GTCTTGATAT GCCAATTAGA GATTTCCTTA 87 60 

CGTGTGATTG GTAAACCCAT CCTAGCCCAA TCTTCTTCTT GGCGATAATT GGGTACCTTC 8820 

AGATTAAACT TCTGATGGAT GGTGTGAGCG ATAATAGAAG CTGAGCCAAA GTTATGCGCT 8880 

AAAGGGGCTT TAGGAATAGG AGCTTTCACA AGCTT AT C C A GATGATTATC TTTTACTCGT 8940 

TATGGACAAT GCTATATGGC ATAAATCAAG TACCTTAAAG ATTCCGACTA ATATTGGCTT 9000 

TGCATTTATT CCTCCATACA CACCAGAGAT G AAC C CC ATT GAACAAGTGT GGAAAGAGAT 9060 

TCGTAAACGT GGATTTAAGA ATAAAGCCTT TCGAACTTTG GAAGATGTCA TACAAGGACT 912 0 

GGAGAAGGAG GTGATAAAGT CCATCGTTAA TCGGAGACGG ACTAGAATGC TTTTTGAAAA 918 0 

CAGATGAGTA TAAAAAGAAA GTCCTCATTT CAATAGAAAT CACGACTTTC TGATGAATTT 9240 

ATAGTAAAAT GAAATAAGAA CAGGATAGTC AAATCGATTT CTAACAATGT TTTAGAAGCA 93 00 

GAGGTGTACT ATTCTAGTTT AAAT C C ACT A TATTTGGGGA GTGATAGAAA AGCCCTTCAT 93 60 

CAGCCAATCT ACTTGTTCAG GTGCG AG AG C TTTGACATCC TTTTCTGTAC TGGACCAAGT 942 0 

CAGTTTTCCG TTCTCAAAGC GTTTATATAA TATCCAAAAT CCTTGACCAT CCCAGTAAAG 9480 

AACTTTAAAG CGGTCTTTAC GTCCACCACA AAAGAGAAAG ACTTGATCGG AG AAAGG AT C 9540 

CAATTCAAAG TGGGTTTTAA CTACATAGGC TAATGAGTCT ATTCCCTGCC TCATATCTGT 9600 

CTTGCCACAA ACAAGGTGAA CTTGACCTAA ATCACTTAGT TGAATTATCA TAGTACAATA 9660 

CCTTTCCTCC GATAATTATT TTTTATCTGG TATACTGGAA GTTGGGGAAT TAGGATAGAT 972 0 

ACCTTGTTAT GACGCGCTTA CTATGAATTT GAAGTATAGT CTCCTAAATG CACTTAGCCC 97 80 

TTATTATAGG GCTTTTTGTT TTAATTATTC TAATCGAGTG AG AC TGGGG A AAAAACAATT 9840 

TCAGGAAAAA TCTAAGCCCT ATACAAAAAA GGAAGCAATT TGCTTCCTTT CTATTATTAG 9900 
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TTATTCAAGG CTGCTGCCAT TGTAGCTGCA ACTTCAGCTT CGAAGTCGTT TGCAGCTTTC 99 60 

TCGATACCTT CACCAACTTC AAAGCGAGCA AACTCAACTA CCGAAGCGTT AACTGATTCA 10020 

AGGTATGCTT CAACTGTCTT GCTGTCATCC ATGATGTAAA CTTGTGCAAG AAGTGTGTAA 10080 

GCTTGGTCAA CTTTAGTGTT ATCAAGCATG AAGCGATCCA TTTTACCTGG AATAATTTTG 10140 

TCCCAGATTT TTTCTGGTTT GCCTTCTGCA GCCAATTCAG CTTTGATGTC AGCTTCAGCT 10200 

TGAGCAATAA CATCATCAGT TAATTGAGCT TTTGATCCAT ACTTCAAGTG TGGAAGAGCT 102 60 

GGTTTATTAA CCATTGCACG GCTTTCGTTG TCTTGGTCGA TAACGTGATT CAATTGTGCC 103 20 

AACTCATCTT TAACGAATTG CTCATCCAAT TCTTTGTAAG AAAGAACTGT TGGTTTCATC 103 80 

GCTGCGATGT GCATTGACAA TTGTTTAGCA AGTGCTTCGT CTCCACCTTC AACAACTGAA 10440 

ATAACACCGA TACGTCCACC GTTATGTTGG TATGCTCCAA AGTGTTGTGC GTCTGTTTTT 10500 

TCAATCAATG CAAAGCGACG GAATGAGATT TTCTCTCCGA TAGTTGCTGT TGCAGATACG 10560 

TATGCAGCTT CAAGAGTTTC ACCTGAAGGC ATTATCAAAG CAAGAGCTTC TTCGTTGTTA 10 62 0 

GCAGGTTTTC CTTCAGCAAT GACTTTAGCT GTAGTATTTA CCAATTCAAC GAATTGAGCG 10680 

TTTTTTGCAA CGAAGTCAGT TTCAGCGTTT ACTTCAATAA CTGCTGCAAC ATTACCGTTA 1074 0 

ACATAAACAC CAGTCAAACC TTCTGCAGCA ACACGGTCAG CTTTCTTAGC TGCCTTAGCC 10 800 

ATACCTTTTT CACGAAGCAA TTCAATCGCT TTTTCGATGT CACCGTCTGT TTCTACAAGC 108 60 

GCTTTTTTAG CGTCCATAAC ACCGGCACCA GATTTTTCAC GCAACTCTTT TACAAGTTTA 1092 0 

GCTGTAATTT CTGCCATTTT AATTCTCCTA TATTTTTTGA AAATAGGAGA GCGCGGCTAA 109 8 0 

GCCCCGCCTC CGG 109 93 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8411 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16: 

CGACGGGGAG GTTTGGCACC TCGATGTCGG CTCGTCGCAT CCTGGGGCTG TAGTCGGTCC 60 

CAAGGGTTGG GCTGTTCGCC CATTAAAGCG GCACGCGAGC TGGGTTCAGA AC GT CGTG AG 12 0 

ACAGTTCGGT CCCTATCCGT CGCGGGCGTA GGAAATTTGA GAGGATCTGC TCCTAGTACG 180 

AGAGGACCAG AGTGGACTTA CCGCTGGTGT ACCAGTTGTC TTGCCAAAGG CATCGCTGGG 24 0 



WO 98/18931 



PCT/US97/19588 



240 

TAGCTATGTA GGGAAGGGAT AAACGCTGAA AGCATCTAAG TGTGAAACGC ACCTCAAGAT 3 00 

GAG AT TT C CC ATGATTATAT ATCAGTAAGA GCCCTGAGAG ATGATCAGGT AGATAGGTTA 3 60 

GAAGTGGAAG TGTGGCGACA CATGTAGCGG ACTAATACTA ATAGCTCGAG GACTTATCCA 42 0 

AAGTAACTGA G AAT AT G AAA GCGAACGGTT TTCTTAAATT GAATAGATAT TCAATTTTGA 4 80 

GTAGGTATTA CTCAGAGTTA AGTGACGATA GCCTAGGAGA TACACCTGTA CCCATGCCGA 54 0 

ACACAGAAGT TAAGCCCTAG AACGCCGGAA GTAGTTGGGG GTTGCCCCCT GTGAGATAGG 6 00 

GAAGTCGCTT AGCTTTAATC CGCCATAGCT CAGTTGGTAG TAGCGCATGA CTGTTAATCA 6 60 

TGATGTCGTA GGTTCGAGTC CTACTGGCGG AGTAAT t GAT AAAAGGGaAC ACAGCTGTGT 72 0 

TCCTCTTTTT GTATCAATTT GTATCACCAA GCATTTTCAT AAGGAAGTCT GTTATTTCTT 7 80 

GAGAACTTTC TTTTTTTCCA TGTGCAATCC AAGTTTGGCA GACACCAAAA AG TG C AT GAG 84 0 

TTAGATAGAT GCTACTATAT TCTAATTCAG TGGTATTTAG ATTCAGTTGC ATAAATCGCT 9 00 

TTTGTAAATC TGT ACT AAG C ATGATATGAA GTTTATTTCG TAAGAAATTT TGGATTTCTT 9 60 

TAGTCCCATT TT C AGAAAG A AGGGC AG C C A GAAGTGGTTC TG AC T C TAG A TATTCAAAAA 102 0 

CTTCTAAAAT AGCGTCTCTT TTGTGATGAG CATGTTTTTG AAAAATATAT TCAAATGTAT 1080 

GGAATAGCTT GCTTTGATAG TGCTCAATCA TAT CAT AC T T AT C CTT AT AG TGAGTATAGA 1140 

AGCTGGAACG ACTAATTCCG GCTTTTTCTA CTAATTTGAC AG T AG AAATT TTATCAAATG 12 0 0 

GCTGTTCCAT C AGTAAT T GT ACCATAGCAT TTTCAATAGT TCGCTTTGTT TTTAAGCGTT 12 60 

TGTTACTTTC TTGCATATTT CCTCCTTGTA AACAAATTAG ACTATATGTC TAAAAATAGA 132 0 

TTTTTTATCT TGTAATTTAG ATTTTTTAAT GT AT AAT C T A TTATATCAAA ATTTTAGACA 13 8 0 

ATATGTTTAA AAAAGGAGAA ACTAAGTTTA AAGAATGGAA AGCAATTTAA AAAAAACCAA 14 40 

CCTTTATTAT TGTCATGATC GGGATTTCTC TTATTCCAGA T CTGT AC AAT AT CAT ATT T T 1500 

TGTCATCAAT GTGGG AT C C A TATGGGCAAT TGTCTGACTT ACCTGTGGCA GTTGTAAATA 15 60 

ATGATAAAGA GGCTTCCTAT AATGGTAATA CTATGGCAAT AGGAAAAGAC ATGGTGTCCA 162 0 

ATTTAAAAGA AAATAAAACC TTGGATTTTC ATTT TGT AG A TGAAGAGGAA GGAAAGAAGG 1680 

GATTGGAAGA TGGCGATTAC TATATGGTAG TGACTTTACC AAGTGATTTA TCTGAAAAAA 1740 

CAACTACATT AT C C AAT ATT CAATCGACAG CAGCTTATCA ATCATTGACA AGTGAGCAAC 1800 

AAACTGAGAT AAGTGATTCT GTATCTCAAA ATT C AACTG A TAGTATTCAA TCGGCTCAGT 18 60 

CAATTGTAGC TTTAGTACAA GATTTACAGG GAAGTTTAGA AAACTTACAA AAT C AAT CTT 192 0 

CTAATCTTTC GACTTTAAAA AATCAATCTA ATCAAGTATC ACCTATTACT TCTACTTCTT 1980 

TGATAGGATT GTCAAGTGGA TTAACAGAGA TACAAGGAGA TGT TACT AG C AAATTAGTTC 2040 
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CTGCCAGTCA GTCGATTGCA TCAGGTGTAA ACG C AT AT AC TACAGGTGTT GATAAAGTTT 2100 

CTCAGGGCGC AAGTCAACTA AGTGAAAAAA ATGCCACCTT GACAGGTAGT TTGGATAAAC 2160 

TAGTTTCAGG CTCAAACACC TTGACACAAA AATCTTCTAG ATTGACAGCA GGAGTTGGTT 222 0 

AATTACAATC AGGATCTGGG CAATTAGCAG AC AAATC C AG TCAGTTACTT TCAGGTGCTT 22 8 0 

C TC C AT TAG A GAATAGAGCT AATAAATTGG CAGATGGATC TGGGAAACTA GCAGAAGGTG 23 4 0 

GAACAAAGTT AACTTCTGGA TTGGAAGATT TACAGACAGG ACTTGCTTCT TTAGGACAAG 24 00 

GACTAGGTAA TGCTAGTGAT CAACTCAAAT CAGTATCAAC AGAATCTAAA AATGCAGAGA 24 60 

TTTTGTCAAA TCCACTCAAT CTTTCAAAAA CAGACAATGA TCAAGTTCCT GTAAATGGAA 2 52 0 

TCGCAATAGC TCCTTATATG ATATCAGTTG CTCTTTTTTT GCAGCAATAT CAACAAATAT 2580 

GATATTTGCG AAATTGCCTT CAGGACGTCA T C C AG AG AG C CGTTGGGCTT GGTTGAAATC 2 64 0 

TTGAGCTGAA ATAAATGGTA TTATAGCTGT TTTGGCAGGA ATTTTGGTAT ATGGAGGAGT 2 7 00 

TCAGCTTATT GGTTTAACTG CTAATCATGA GATGAGAATA TTTATTCTCA TCATCCTAAC 2 7 60 

AAGTTTAGTA TTCATGTCTA TGGTGACCAC TTTAGCAACG TGGAATAGCC G TAT AG GAG C 2 82 0 

TTTTTTCTCA CTTATTTTGC TTTTACTACA GTTAGCATCA AGTGCAGGTA CTTATCCACT 2880 

TGCTTTGACA AATGATTTCT T TAG AT CT AT TAATCCCTGG TTACCAATGA GCTATTCAGT 2940 

TTCGGGATTA CGACAAACAA TCTCTATCAA CAAGTCATTT TCCTAGCTGT CATACTAGTT 3 0 00 

CTATTTACTA GT T T AG GT AT GCTAGCCTAT CAACATAAGA AAATGGAAGA AGATTAAAAA 3 0 60 

AATCGACCGA TTAACTGGTC GATTTTTTAT GCCTTAGATG ACTTTCGTCT GTG AT TAT AG 312 0 

ATTCCAAATA GTAAGAGAGA AGTAAAGGAA CAGATTGCTC CAGTAATAAA ACCATTGGGA 3180 

ATGAAGGAAA GTGTAATAGT TCCTTTCCCC TTGGGAATGT CAACTTTCAT AAATCCAGTT 32 40 

TGAGCTTGTT TAATTTCTAT TTTCTTACCA TCTTGGTAGG CAGACCAACC TTTGTCATAA 3 3 00 

GGAATGGTGA AGAAAATAGA TGTATCTTGT TGGACATCAT ATGTAGCAAA AACCTTGTTT 3 3 60 

TTAGAAGTTG ATACTGTGAC AGGTTGTTCT TTAATTTTTT GAATTGCCTC GGTGAAAGTT 34 20 

TTGGTATCTA AACGATAGAA GGTAGGAGAT TCAAATGATA CTTGTGAATT TCCAGGGAAA 3 4 80 

CTAACATTGA TATTGAAAGT TTTTTTCTCT TTAGTATATC CTAGATTAAA GAAGGAGAAG 3 540 

AC ATTAT C AG TTGTAAAAGT CTTTTTTTCA CCATTTACAA GGATGTCAAC CTTCTTTTGT 3600 

TT AT CGTT AG AAAAGTGAAG GTTTATGAAA GAGAGATAAA CTTGGCTGTT TTCTGGAACT 3 660 

TCAATTTGAT ACTGGATTGC TGCATCTTCA TTTGAAGAAC TTGTGACACT AATCAAATCA 372 0 

TTAGTATTTT CTATTTTTTC TGTTTTTTCA TAAGGTATTG GAGAAAAATA ATCAAAATTG 37 80 
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ACGTTAGCAA GTTGATTTAA AAATGAGGCC TGATTATCCA AGGTATGTTC ATTGAACTTG 3 840 

ACATCATTGT AAAC AG AT T G ACTCGCAACT GCAATCGGAA GAGAGTATTG ATTTTCATAT 3 90 0 

AGGGTAAGAT TATCTTTTTG AT AG AT AT CT TT AAAG C CAT ACTTATCAAT AGGACTGTCT 3 9 60 

GAGATATTGT ACTGGATACC AAATAAACTA TCAGCCAAAA TACTATTATT TGCATATCGG 402 0 

AGATTGAGAT TAGTCCCAGA GGATTTAAAA CCAAGTTTAT CTAAAGTAGA GCTTGATGAA 4080 

CGATTTCGAA CAGATGAAAA T T GAG AG AT T CCATTGTAGT TGAATTTCAT ACTGTCATTT 4140 

CCTGTCTGAG TTTGTAGTTT TTCAGTACGA GTAAATTGAT TTCCAATATA TGTTGAGAAA 4200 

GATTCCATAG CTGGGATATC TCGACTATAA GCACTTCGAG AAGCAAATCC CCATTCCTTA 42 60 

GCAATTCCGT C C AT T T GAGA TGAAGCATTT AAACTCATTT CAACCAGTAT AAATAAAGAG 43 2 0 

ATTAGAATGG CAAATAGATT CACAGATATA AACTTTTTGA TAACTGCAAG GAGTAAAAGA 4380 

GAATAGACAA CCAAAAATTC AAGAGTAAGC AGAATATTCA AATCTGTTAA AAAAGAATAA 4440 

TGCGATTTTA GATAGATGGT AGCTAAAAAT CCTGCTACTA CAAGAAAAAG CGAAACTAAA 45 00 

AAATTCCAGA CTTTAAGTTC T TT C AG AC GC TTTAAGACTT CTGCTGCTGT GTAAATTAAC 4560 

AAGGTAGAGA AAATCCAAGC ATAGCGATGT AAAAACATGT TTGGAGTATG CATGCCTTGC 4 620 

CAAAATAAGT CAAGAGCTTC TATGTAAAAG CTTGCAATTA GAAATGCAAA GAATATTACA 4680 

TATATGAGTT TCACGTGAAA CTTAATAGAT TTCAGCGTAA AAAATAAAAT GGTCAAAATA 474 0 

AAGGGAAATA GTCCAACAAA AATCATTGGG ATGGCCCCAT ACTTTGTTGT GTCAAAGGAA 4800 

CCAATGAATT GCTTAGCAAA GAGATCAAGA TACCAGCTAC TTTCAGTTTG AAACTTTGTA 4860 

ACTTCAGTCA ATTTTTCCCC ATGTGTCTGT AAATCAAATA GAGTGGGAAG AGTCATAATC 4 920 

AAACTAGCCA TACCAGCTAA AAAGGAGATA ACTATGAAAT CAAGAACAGA TGATTTTCGA 4980 

GTCTTAAAGT CCCACGAAAT TTGACAGAGA T AC C AGAAAA TAAGAAACAA TACTGTCATA 5040 

TATCCAAAAT AATAATTTTG AATAAATAAG ATTGACAGAC TTGTAAAGTA CAATAGGAGT 5100 

TTCTTTTCAG TTATCAGTAG ATGTAAACCA GTTATAATTA AAGGAATCAA GATAAAAACA 5160 

TCTAGCCAGG TTTTTATCTC TAATTGACTG ACAGTGAAAC TCATCAGAGC ATAGGAAGTA 522 0 

GATAAGGCTA GTTTTAAAAT CTGAGGGATA GATTGAAACA ATTTATTCAA ACTAAAAAAG 5280 

GTTGACAGAC CAATCAATCC AAATTTTAAG AGAGTTGTCA GAT AG AT AG C ATCTGGCATA 53 40 

TTCGTTAGAT CAAAAAAGTA AACCAGAGGC GCGAGAAAAC TACCCAAGTA ATAACTAGAT 5400 

AGGGCATAGA AGTTTAGCCC TAGACCACTT GTAAAGGTGT AAAAC AG AT T ACTATTTCCA 54 60 

TGTAGGATAT TTCGTAAGGC T AC AT C AAAA ATAACGTATT GATGAAAGCC ATCTCCTAAT 552 0 

AGAGGAGAGT TGTCGCTATT C C AGT AG AT A CTTTGAGATA GATATACTCC AG AC AT AAT C 558 0 
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ACTACAGGAA 


TGATGAAAGA 


AATAAAATAG 


GTTCGATATG 


TTTTTAAAAA 


TGATTTCATG 


5640 


TTACCTCGTA 


GAATGATAGA 


AAACTCAGTT 


GGTTAACCCA 


ACTGAGTTTT 


GAAGTTTTAT 


5700 


TTAGTCTTTC 


CAAAGTTCTT 


TAACTTTTGC 


TTGTACTTCT 


GCATTTTCTA 


GGAATTCATC 


5760 


GTAGGTTTCA 


TCGATACGGT 


CAATGACGCC 


ATTTTTAGAT 


AAGACAATGA 


T ATGGT TAG C 


5820 


CAAAGTTTGA 


ATAAATTCGT 


GGTCATGGCT 


GGCAAAGATG 


ATTGATTCTT 


TAAAGTTTTT 


5880 


CAATCCATCA 


TTCAAGCTTG 


AG AT AGATT C 


CAAGTCCAAG 


TGATTTGTTG 


GATCATCAAG 


5940 


TACAAGGACA 


TTTGATTTTA 


AGAGCATGAG 


TTTTGAAAGC 


ATGACACGAA 


CTTTTTCTCC 


6000 


CCCTGACAAG 


ACATTTACAG 


GTTTGTTAAC 


TTCATCTCCA 


GAGAAGAGCA 


TACGGCCGAG 


6060 


GAAGCCACGT 


AGGAAAGTAT 


TGTCATCTTC 


TTCTTTACTT 


GCGAATTGAC 


GCAACCAGTC 


6120 


AAGAATTGAT 


TCTCCTCCTG 


CAAAATCAGC 


TGAGTTATCT 


TTTGGTAGGT 


AAGATTGACT 


6180 


AGTTGTAACT 


CCCCACTTGA 


CAGTTCCTTC 


ATAGTCAATA 


TCTCCCATGA 


TTGCACGAAT 


6240 


TAATGCAGTC 


GTTTGAATAT 


CATTTTGTCC 


AATAAGTGCT 


GT C TT AT CAT 


CTGGACGCAA 


6300 


GATGAAACTA 


ATATTATCCA 


AGATAGTTTC 


ACCATCAATC 


TTTACAGTTA 


AATTTTCTAC 


6360 


TGTCAAGAGA 


TC ATT AC C AA 


TCTCACGTTC 


CGCTTTAAAG 


TTGATAAATG 


GATATTTACG 


6420 


ACTAGATGGC 


ACAATCTCTT 


CTAGCTCAAT 


CTTATCAAGC 


ATTCTCTTAC 


GTGATGTTGC 


6480 


CTGCCTTGAC 


TTAGAAGCAT 


TGGCAGAGAA 


ACGAGCAACA 


AATTCTTGCA 


ATTGTTTAAT 


6540 


TTTTTCTTCT 


GCTTTAGCAT 


TACGGTCTGC 


TAGCAATTTA 


GCAGCAAGCT 


CAGAAGATTC 


6600 


CTTCCAGAAG 


TCGTAGTTTC 


CGACATAGAG 


TTTGATTTTT 


CCAAAGTCAA 


GGTCGGCCAT 


6660 


GTGAGTACAA 


ACTTTGTTTA 


AGAAGTGACG 


GTCGTGGGAT 


ACTACGATAA 


CTGTGTTATC 


6720 


AAAGTCAATC 


AAGAAGTCTT 


CTAACCAAGT 


AATCGATTGG 


ATATCCAAAC 


CGTTAGTAGG 


6780 


CTCGTCCAAG 


AGAAGAACAT 


CTGGTTTACC 


AAAAAGTGCT 


TTGGCGAGGA 


G AACCTT T AC 


6840 


TTTTTCACCG 


TTGGCCAATT 


CGCTCATGTT 


TTGGTAGTGT 


AATTCTTCTG 


GAATGTTTAG 


6900 


GTTTTGAAGT 


AGTTGAGAGG 


CTTCACTCTC 


TGCTTCCCAA 


CCTCCAAGTT 


CGGCAAACTC 


6960 


TCCTTCGAGT 


TCGGCAGCAC 


GAACCCCGTC 


CTCGTCTGAG 


AAATCTTCCT 


T C ATGT AG AT 


7020 


AGCATCTTTC 


TCTTTCATGA 


TGCTATAAAG 


TTTTTCATTT 


CCCATGATAA 


CGACATCAAT 


7080 


GGCACGTTCA 


TCTTCGTAGT 


C AAAGTG AT T 


TTGACGAAGA 


ACAGAGAGAC 


GTTCATCTGG 


7140 


ACCAAGAGAG 


ATGTGACCAG 


TAGTAGGTTC 


GATATCTCCA 


GCTAAAATTT 


TTAAAAAGGT 


7200 


TGATTTTCCG 


GCACCATTAG 


C AC CG AT T AA 


TCCGTAAGTA 


TTTCCTTCTG 


TAAATTTGAT 


7260 


AT TG AC AT CA 


TCAAAAAGTT 


TGCGATCACT 


AAAACGTAGT 


GAAACATCAG 


ATACTGTAAG 


7320 
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CAATGTTTTT 


CTCCTATATG 


TGTAATATAT 


TTATTCTACT 


AGAAAATACA 


G AAATATT C A 


7380 


AATTTTTATT 


TGTCAATTTT 


GTGTAAATTA 


TATTTACAGT 


ATCCTTTACA 


CAAATCTGTA 


7440 


AAAAGCAAGG 


CTGATTTATT 


TTGATAAATT 


ACGGTTATTT 


CATTAAAAAA 


ATGCTATAAT 


7500 


TGAAAGGACT 


ATATCGAAGG 


AGAACAAAAT 


GACTAAACCC 


ATT AT T T T AA 


CAGGAGACCG 


7560 


TCCAACAGGA 


AAATTGCATA 


TTGGACATTA 


TGTTGGAAGT 


CTCAAAAATC 


GAGTATTATT 


7620 


ACAGGAAGAG 


GATAAGTATG 


ATATGTTTGT 


GTTCTTGGCT 


GACCAACAAG 


C C TTG AC AG A 


7680 


T C AT GC C AAA 


GATCCTCAAA 


C C ATTGT AG A 


GTCTATCGGA 


AATGTGGCTT 


TGGATTATCT 


7740 


TGCAGTTGGA 


TTGGATCCAA 


ATAAGTCAAC 


TATTTTTATT 


CAAAGCCAGA 


TTCCAGAGTT 


7800 


GGCTGAGTTG 


TCTATGTATT 


ATATGAATCT 


AGTTTCGTTA 


GCACGTTTGG 


AGCGAAATCC 


7860 


AACAGTCAAG 


ACAGAGATTT 


CTCAGAAAGG 


ATTTGGAGAA 


AGCATTCCGA 


CAGGATTCTT 


7920 


GGTCTATCCA 


ATCGCTCAAG 


CAGCTGATAT 


C AC AGCTTT C 


AAGGCTAATT 


ATGTTCCTGT 


7980 


TGGGACAGAT 


CAGAAACCAA 


TGATTGAGCA 


AACTCGTGAA 


ATTGTTCGTT 


CTTTTAACAA 


8040 


TGCATATAAC 


TGTGATGTCT 


TGGTAGAGCC 


GGAAGGTATT 


TAT C C AG AAA 


ATGAGAGAGC 


8100 


AGGGCGTTTG 


CCTGGTTTAG 


ATGGAAATGC 


TAAAATGTCT 


AAATCACTAA 


ATAATGGTAT 


8160 


TTATTTAGCT 


GATGATGCGG 


ATACTTTGCG 


TAAAAAAGTA 


ATGAGTATGT 


ATACAGATCC 


8220 


AGATCATATC 


CGCGTTGAGG 


ATCCAGGTAA 


GATTGAGGGA 


AATATGGTTT 


TCCATTATCT 


8280 


AGATGTTTTT 


GGTCGTCCAG 


AAGATGCTCA 


AGAAATTGCT 


GATATGAAAG 


AACGTTATCA 


8340 


ACGAGGTGGT 


CTTGGTGATG 


TGAAGACCAA 


GCGTTATCTA 


CTTGAAATAT 


TAGAACGTGA 


8400 


ACTGGGTCCG 


G 










8411 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9064 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TGCCGTACTC AAGTACAGCC TGCGCTAAGT TTCCTAGTTT GCTCTTTGAT TTTCATTGAG 60 

TATTAGTAAC CAAAATCCGA CCACATAGCC AGCCCCTATG AATATAGCCA TTAAAGCTAG 12 0 

CATGGAATTT AGGAAATTAA AAACCACCGC AGATACAAAG GTTAGCACAA AAACATTAAA 18 0 

AGCAATGGTG TCAGAAGCCA AGACTAGAAT ATAGGGTGTC AACCGATCTA AAGTTTTGGA 2 40 

AT CT AGG AAA AATAAGTGTT TATACATGAT GACCTCCTCT ATGGCTGAAA AGCAAGCCTT 3 00 
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TTGTTTTTTT ACCCCAAGAC CCTATGTAGA AAAGTGAGCA AAAACGGGAA GGTCGCTACA 3 60 

ATATTATTGA TCACATGCAC CGCATAGGAT GGATAAATGC TCTTGGTATA GCGGGTCAAA 42 0 

CCAGCAAAGA TG ATT C C AAC TGTTGCAAAG ACGAAGATAT CTAACAGACT AGGCAGGCTT 480 

GAAAAATGAG GG AG AG C AAA TAAAATAGAA GGAAGAAGCA AATCAAGACC AAATCGCGAA 540 

TGCTTAAAGA AAGCATGTTG CAGTAATCCT CTATAAATCA ATTCTTCCAT CAGTGGAACC 600 

AGAAAGAACA GGGCTATATA AATACCTAGC TCTGCAAAGT TAGTCCCACT ATAACCAATC 660 

AATACAGCCC AACCTTCCGC AGTTGACTGA ACATGTTTAG CTGTCTGAAC GTTAAAAGAG 72 0 

AT C TGG AAC A CTAGCACTAA T AC T GTC AAA ATCGAATACC AAAGCCATTT TTTTCTTGGA 7 80 

ATGCGGAAGA GATAACCATG GCCTGTCTTA ACAAGAACCA CAATCATGAC TCCAATAAAA 84 0 

AGTAAACTCA AGATATTTTG AATCCAGAAT AAATTGCCTA TCTGAGAAGA AAATTGCCAA 9 00 

TAGTTTTGGA CGATAAGCGT CAGCTGAGAA AGACTAAATA CGAAAAATAA GTAAGAGAAG 9 60 

AC TGC AC TT A TTTTGAATAG AAGTTGATAC TTTTTCATAG AAATCCTCCC TACT AT G AC C 102 0 

TCACCTTGTC AGGCTCTACT GCTGTAAGAT TAAGAAGACA GTTTGTTTTT TTTAAGGCTA 1080 

ACCTGACTAC TAGATAATAG ATACATTAAG GCATTAAAGA CAATGAAAAT ATGTCCATAG 1140 

AATAAAATCA ACCTCGCATC CAAACCAAGA TAAAGTTTGA TTATCAAAAA GATGAGCAAA 12 00 

AGAATTTGAA ACCATAAGGT TTTTCCAAAA ATAAATTTAA AGCGATTTCG AATATCTACT 12 6 0 

T C CT TG ATT T TTACCGCCAC CCCTTTATTA GCAAGAAGGA AAACTCCTGC TTCAAACAAA 132 0 

CCACTGTAAA GAACAAGCCA CCCAATAGAT AC GAT AG AG A TTTGTAAAAA TGTCCCTAAA 1380 

AGAATATCCA ACACACTACT CAAGAAAATA ACAAAAAATA ATCTGTATTT CATATTAAAT 1440 

ACCTCCATTC ATTTATTTCA CTAACAATTT AATAGAGCCT TCTACTCAAA TATCCTGTCA 15 00 

GAAAAGGATA GAAAGCTACT TTTTATAATA CTTCAAGCCC CACATGAGCA GAAGCGTGAT 15 6 0 

AAACAAGCAG AGAAT AC AC C TATATAAGCG ATTAGTTGTT GATAGAATTC TGTTTCTGAA 162 0 

ATACCTCTAT ACAAACAAAT GACAAACATA AAATCTGCCA AGC CG AT AAA CATAAGTTGA 168 0 

TTGGTTCTAG GACTAACCAA AT CAT C AT TT ACTTATATTT AAGAGTATCT CTTTTATTTT 1740 

AATGTATGTT AGCACTGAAA AGCAAGACAG GCCAATAATA TT T AAAATG A ACAGTAACGG 1800 

GGTTAAGTCT CTAAAAAAAT TATCTACTGA CACTACAAGA AATACTATAC ATATTATAGT 1860 

CGAAACTATC TTTTTCTTAT CCATAATTAT TTACTCCTTT CCTAACAAAT CCAGCTTATC 192 0 

AATCAAGAGC GATTTTTAAC ATAATGTAGC AGCACCCGTT GCAACTTTGA CAAGTTTAGT 19 8 0 

AT AT C ATTGT TTTTTAAAAT TTTTCATCCA AATCTTGAAT TGTCATCGAA ACATCTTGAA 2 04 0 
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TTGTTAAAAA ATTTAAAAAG TAAGCATTAA AAACATACTT TCCTCTTTAT ATTGTATTGA 2100 

TACCAACTTG TTTGTAGACT TTTCATCCTG CTATCACATA TCATTTTGAC AGGCGAAACA 2160 

ATATTAAAGA AACTCCCCTG TAAATTAAGC TAGCAAATAC AGGGGAGAAA TTTATTTTTT 222 0 

AGAGAGTACT ATCCGTATCC TTTTTGGAAG ATTTTGAAAA TATTTTTCTA ATTAAGTCAT 22 80 

GCATATAAGG ACCAAATATA CCAACTACTA AACCAATAAT AAAACTTTTA AAATCCATAA 2 34 0 

TTACCACCAA CATATTGCTG CATAGGCTAC ACCTCCAAGT ATAGCTCCAC CTGCAGCACC 24 0 0 

AGTTACACCT ATTCCTATAG CAAATGGTCC CAATAGAAAT GTCAAACCGT TGTTGCACAC 2 4 60 

CCATCAATTG CGCCATATGC AACCCCTGCT GCACAACTAA TTTTTCTTCC CCAATCAATA 2 52 0 

TCTCCACCTT CAACGCAAGC AAGCATTTCA TTATCCATAA CTGCAAATTG T G AC AT C ATT 2 580 

TTTGTATCCA TATAGTGTAT CACTTTTCAG TTACGGAACA AGTTTAATAT AAAAATTATC 2 640 

AAAAAAACAT AGGCAATAAA GAGAAAAATT AAT T TAT CAT AG AT TAG AAA TAATATGACA 2700 

AAACAATTCA ATGATGTTAA TTCAATAGTC TTTTGTTTTT TATCGGAGAT AC TT ATGG AT 27 60 

AGATAAATAA GATAGGTTTG AAAAGCGAAG AGAATAATAA AG AAT AT AG C CTTCATAAAA 2 82 0 

TTTAGCTTTC ATTTTTATGA TGTAGCGGTA TAGGCTAAAT ATCCACAAAC CACTGCTCCT 2 880 

CCAATTCCTC CTATTGCAGC GCCCCATGGT CCTAGAAGTC TCCCATATTT CACTCCACCC 2940 

GCTGCACAAC CTAAAGCAGC AACTACAGCT GCTCCTCCGG AATTACCTCC ATAAACCTCA 3 000 

CTCAGCATTG TTTCATTTAT ATTACAATAA GTATTCATAC AAGTCTCCTT TTATTAAAAT 3 060 

CCACCCGTTG CCCCTGTTAC TCCTGCCCAA AGATCCACAC C AAAT T TAG C TCCTATGTAT 312 0 

CCACATGCTC CCATAAATGG TGCTCCAACA CCACTCGCAG CACAAATAGC TGTCCCTAGC 3180 

CCCCAGCCAC C AAAAGC AG C ACCACCACCT TCTAAGACAT TAGTTTGCCA ATTATTCTTG 3 24 0 

CCTCCTTCAA TACTAGATAA CATAGTTATA TCCATTTCAT GAAATTGTTC CATAATTTTT 3 3 00 

GTATCCATGA C AAAT ACT C T TTTTTATTTT TAATTTTTGT CTTGTTGTAA CTTTGACAAG 33 60 

TTTAGTATAT CATCGTTTTT TAAAATTTTT CATCCAGATT TTGAATAGTC ATCGAAACGT 3 42 0 

CTTGAATTGC AAAAAT T AC A TTAGACTTCC TGCAAAACTA GAATCCTAGT TCATGATTGA 3480 

TAATACCAGC ACTCAAATTC ATTCGTAATC CGAAGCGTTT ACGATGACTT CGATAGGTTG 3 54 0 

TTGAAAACAT TTTAAACGTT TTTACTTTGG CAAAGATGTT CTCAACCTTG CTTCTCTCCT 3 600 

TAGATAGCGC ATGGTTACAG GCTTTATCTT CAACTGTTAG CGGTTTGAGT TTGCTGGATT 3 660 

TACGTGAAGT TTGTGCTTGA GGATATATCT TCATGAGCCC TTGATAACCA CTGTCAGCCA 372 0 

AGATTTTACC AGCTTGTCCG ATATTTCTGC GACTCATTTT GAACAACTTC AT AT C ATG AC 3780 

AATAGTT C AC AGTG AT AT C C AAAGAAACAA TTCTCCCTTG ACTTGTGACA ATCGCTTGAG 3 840 
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TCTTCATAGC GTGAAATTTC TTTTTACCAG AATCATTCGC TAATTCTTTT TTTAGGGCGA 3 900 

TTGATTTTTA CTTCCGTCGC ATCAATCATT ACCGTGTCCT CAGAACTGAG AGGAGTTCTT 3 9 60 

GAAATCGTAA CACCACTTTG AACAAGAGTT ACTTCAACCC ATTGGCTCCG ACGGAGTAAG 402 0 

TTGCTTTCGT GAAC AC C AAA ATCAGCCGCA ATTTCTTCAT AAGTGCGGTA TTCTCGCACA 4080 

TATTGAAGAG TGGCCATAAG AAGGTCTTCT AGGCTTAATT TAGGTTTTCG TCCACCTTTT 414 0 

GCGTGTTTAA GTTGATAAGC TGTTTTTAAT ACAGCTAGCA TCTCTTCAAA AGTCGTGCGC 42 00 

TGAACACCAA CAAGACGCTT AAATCGTGCA TCAGTTAGTT GTTTACTTGC TTCATAATTC 42 60 

ATAGAACTAT AGTAAAATGA AATAAGAACA GGATAAATCG ATCAGGACAG TCAAATCGAT 432 0 

TTCTAACAAT GTTTTAGAAG TAGAGGCGTA CTATTCTAGT TTCAATCTAC TATACTATAC 43 80 

CATATTTTGT TTCGCAGGGA ATCTATTATA AAAGGGTAAG TATTGCAAAA ACACTTACCC 4 440 

TTTTCTTTTA TACTTCATTA AGCTCTACTT TTTATAATAC TTCAAGCCCC ACATGAGCAG 45 00 

AAGCATGATG ATTAAGCAGA GAACAGCGCC AATATAAGCG ATTATTTGTT GGTAGGATTC 456 0 

TCCTGCTGTG ATACCTCTAT ACAAACAAAT AATAGACATA AAACCTGTCA AGCCGATGAA 4 62 0 

CATAAGTTGA TTGGTTCTAG GACTAACCAA ATCATCATCT TCAAACTCTC TTATCCTCAT 4680 

TTCCCTAGTG AGATAAACAG TAACCAAAAT AGAAGCCAAG TTAATAACTA CTAAAAGAAA 474 0 

TTGGAAAACT ACGGAAAAAT TTAAAAACTG ACGAGATAGA AATAGATAAG TAGAAACAAG 4 800 

CAAGGGCAAC TGACCTAAGA ACAATCTCGC AAGGAAGATG TTCCGTTTTT TAGCAAGAAA 4 8 60 

AGTTTTCATT TCTTTTCTCC TTTCTTTTTA TTGATAGCAA AAT AG AT C AT AACTGCAATC 4 92 0 

ACATAGGCTA TGGTATAAAA TAGCTGATAC CAAGCACTCT CCCTAAGCGG ATATAGAAAG 4 980 

ATGGACATGA TTAGATACAG AACGAAAATA ATCAGTATTT TTTTCTTCAT AAGATTTCCT 504 0 

CCTAAATGTG CGATTTATCT TAGTTGAGCA AGAACATTTA CACTGCTAGT ATAGCACTTA 5100 

TTTTGACCTT GGATCACTCA AATCATAAAT GGTCATCAAA ACCTCTTGAA TTGTAAAAAT 516 0 

TAAAAAAGCA AGCATGAAAA ACATACTTTC CT C T T TAT AT TGTATTGATA CCAACTTGTT 522 0 

TGTAGACTTT TCATCCTGCT AT C AC AT ATC ATTTTGACAG GCGAAACAAT AT T AAAG AAA 5280 

CTCCCCTGTA AATTAAGCTA GCAAATACAG GGGAGAAATT TATTTTTTAG AGAGTACTAT 53 4 0 

CCGTATCCTT TTTGGAAGAT TTTGAAAATA TTTTTCTAAT TAAGTCATCC ATATAAGGAC 5400 

C AAAT AT AC C AACTACTAAA C C AAT AAT AA AACTTTTAAA AT C CAT AAT T ACCACCAACA 5460 

TGTTGCTGCA TAGGCTACAC CTCCAAGTAT AGCTCCACCC GCAGCACCAG TTGCTGCACC 5520 

TTGCCATGTT CCTGTTTTAA TGCCTAGTTG AAGACCTCTT GCTGCTCCTC CTCCAACACC 5580 
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TGCTTTGGCA AAATCTCCCC AATTGCATCC GCCACCTTCA ACGCAAGCAA GCATTTCAGT 564 0 

AT C C AT AAC A GAAAATTGTG ACATCATTTT TGTATCCATG AC AAAT AC T C CTTTTTTAAA 5700 

AAACTAAAAT AAATCAGAAT AGAATCCTCA TAATTTTACT ATAAGTCTTA CCAACTTAGT 57 60 

CCCAATTTAT CACCAACCAT ACCTCCTAAG CATGTTAATC CACCCCCAAT TGCACCAATG 5 82 0 

TGTGCTCCAA CAAATGCACC AGCAAGTCCA GCTACTCCTA AAGTGGCCAA ACCTGCTCCA 5880 

GTTCCACCAG TTATAATTCC CGTAGTGACT CCTGTAATCA GTGCATTTTG ACAATCAGTG 5940 

GAGCTATACC CCCCTTCAAC TTTCGCAAGC ATTTCAGTAT CCATAACCTC TAACTGTGAC 6000 

AACATTTTTG TATTCATGAT GAATACCTCC TTTTTATTTT CAATTTGTTA CCAAAGTCTT 606 0 

AAATTCAATA AACAAATAGA TTTTTTATAG TATCTTTTTG ATTTTCTTAA AAAAGTATAT 612 0 

ACGTCTACTA TCTTCTTAAA GGTAGCAGTA CCTATTTTTT AGTCTAAGAT TTCAATAATC 6180 

TTGAGTATCT AAAATATCTT AATTTCGTTA TTCTCCTTGC AATAAAAAGT TTTACTATAC 62 40 

TATTTATTAA CTTGCAGAAA GCAAAAAATA TT AG T AAAT A AT AG T T TATA GTTAAGTTTT 63 0 0 

TTATTCCTAC CAATCCATCA ACTAAGTAAA GCATCAACGA TTACATAAAC GATTGATAAT 6 3 60 

ATAATTAAAA TTTTGCTAAC TATCTTATTC TCATCATTCT TAG AT AAC TT TGATATTTTG 642 0 

TAAGTAAGTA AATAAGACAG TAAATTAATA GCGATAATAA T AC T AT ATTT AAG AAT CAT A 64 8 0 

ATCTTACAAA GAGGACATAA TTCCTGAACC TACACAAATA AGTGTTGCTG CTCCCCCAGT 6 54 0 

TATCGGACCA GTCGCAGCAG CTAATAGTAC TGCTCCAATA CAACCACCGA TTGCAGATCC 6 6 00 

TAAATTGCCT CTTCCTCCAC TAACTATTTC GAGTTCTTCA TTATCCATAA CAGAAAATTG 6 6 60 

TTCCATCATT TTTGTATTCA TGACAAATAC TCCTTTTTTC TTTTTTTATT TTTGTCTTGT 67 2 0 

TGTAACTTTG ATAAGTTTAG TATATCATCG TTTTTTAAAA TTTTTCATCC AGATCTTGAA 67 80 

TTGTCATCGA AACGTCTTGA ATTAGCTTTT T T ATTT C AAG CCACCTCTAA ATGTTTAAAA 684 0 

AAAATAATTT CTAATCACTT TTTTACCATT CAGGAAGTTT TAATGACTAT TCAAGATTTC 6 9 00 

ATAAAATATG AACTTAGTTT TATGACATAA TAG AC C TAT C C AC T AT ATG A AAGGAATTGC 6 9 60 

CAATGACTTC TTATAAACGT ACATTTGTTC CTCAAATAGA TG CG AG AG AC TGTGGTGTCG 7 02 0 

CTGCCTTAGC CTCGATTGCT AAATTCTATG GT T C AG AT T T TTCTCTAGCT CACTTGAGAG 7080 

AACTTGCAAA GACCAATAAA GAAGGGACGA CTGCTCTTGG CATTGTAAAA GCCGCTGATG 714 0 

AAATGGGCTT TGAAACAAGA CCTGTTCAAG CAGATAAAAC GCTCTTTGAC ATGAGTGATG 72 00 

TCCCCTATCC ATTTATCGTT CACGTTAACA AAGAAGGAAA ACTCCAACAT TACTATGTTG 72 60 

TCTATCAAAC AAAGAAAGAC TATCTGATTA TTGGTGATCC TGACCCTTCT GTAAAAATCA 7 32 0 

CTAAAATGTC AAAAGAACGC TTTTTCTATG AATGGACTGG AGTAGCTATT TTTCTAGCTA 7 3 80 



